Reward#
- class genesis_forge.managers.RewardManager(env: GenesisEnv, cfg: dict[str, RewardConfig], logging_enabled: bool = True, logging_tag: str = 'Rewards')[source]#
Bases:
BaseManagerHandles calculating and logging the rewards for the environment.
This works with a dictionary configuration of reward handlers. For each dictionary item, a function will be called to calculate a reward value for the environment.
- Parameters:
env – The environment to manage the rewards for.
reward_cfg – A dictionary of reward conditions.
logging_enabled – Whether to log the rewards to tensorboard.
logging_tag – The section name used to log the rewards to tensorboard.
Example with ManagedEnvironment:
class MyEnv(ManagedEnvironment): def config(self): self.reward_manager = RewardManager( self, cfg={ "Default pose": { "fn": mdp.rewards.dof_similar_to_default, "weight": -0.1, }, "Base height": { "fn": mdp.rewards.base_height, "params": { "target_height": 0.135 }, "weight": -100.0, }, }, )
Example using the reward manager directly:
class MyEnv(GenesisEnv): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.reward_manager = RewardManager( self, cfg={ "Base height": { "fn": mdp.rewards.base_height, "params": { "target_height": 0.135 }, "weight": -100.0, }, }, ) def build(self): super().build() self.reward_manager.build() def step(self, actions: torch.Tensor): super().step(actions) rewards = self.reward_manager.step() # ... other step logic ... return obs, rewards, terminations, timeouts, info def reset(self, envs_idx: list[int] | None = None): super().reset(envs_idx) # ... other reset logic ... return obs, info
- last_episode_mean_reward(name: str, before_weight: bool = True) float[source]#
Get the last mean reward for an episode for a given reward name. The mean reward is only calculated when episodes end/reset.
- Parameters:
name – The name of the reward to get the mean for.
before_weight – If True, this will be the base reward value before the weight was applied.
- Returns:
The last mean reward for an episode for a given reward name.
- reset(envs_idx: list[int] | None = None)[source]#
Log the reward mean values at the end of the episode
- step() torch.Tensor[source]#
Calculate the rewards for this step
- Returns:
The rewards for the environments. Shape is (num_envs,).
- property episode_data: dict[str, torch.Tensor]#
Get the accumulated reward data for the current episode of all environments.
- property rewards: torch.Tensor#
The rewards calculated for the most recent step. Shape is (num_envs,).