Reward#

class genesis_forge.managers.RewardManager(env: GenesisEnv, cfg: dict[str, RewardConfig], logging_enabled: bool = True, logging_tag: str = 'Rewards')[source]#

Bases: BaseManager

Handles calculating and logging the rewards for the environment.

This works with a dictionary configuration of reward handlers. For each dictionary item, a function will be called to calculate a reward value for the environment.

Parameters:
  • env – The environment to manage the rewards for.

  • reward_cfg – A dictionary of reward conditions.

  • logging_enabled – Whether to log the rewards to tensorboard.

  • logging_tag – The section name used to log the rewards to tensorboard.

Example with ManagedEnvironment:

class MyEnv(ManagedEnvironment):
    def config(self):
        self.reward_manager = RewardManager(
            self,
            cfg={
                "Default pose": {
                    "fn": mdp.rewards.dof_similar_to_default,
                    "weight": -0.1,
                },
                "Base height": {
                    "fn": mdp.rewards.base_height,
                    "params": { "target_height": 0.135 },
                    "weight": -100.0,
                },
            },
        )

Example using the reward manager directly:

class MyEnv(GenesisEnv):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.reward_manager = RewardManager(
            self,
            cfg={
                "Base height": {
                    "fn": mdp.rewards.base_height,
                    "params": { "target_height": 0.135 },
                    "weight": -100.0,
                },
            },
        )

    def build(self):
        super().build()
        self.reward_manager.build()

    def step(self, actions: torch.Tensor):
        super().step(actions)
        rewards = self.reward_manager.step()
        # ... other step logic ...
        return obs, rewards, terminations, timeouts, info

    def reset(self, envs_idx: list[int] | None = None):
        super().reset(envs_idx)
        # ... other reset logic ...
        return obs, info
__contains__(name: str) bool[source]#

Check if a reward config item exists by name.

__delitem__(name: str)[source]#

Delete a reward config item by name.

__getitem__(name: str) RewardConfigItem[source]#

Get a reward config item by name.

__iter__() Iterator[str][source]#

Iterate over the reward config item names.

__len__() int[source]#

Get the number of reward config items.

__setitem__(name: str, value: RewardConfigItem)[source]#

Set a reward config item by name.

build()[source]#

Build any config item function classes.

last_episode_mean_reward(name: str, before_weight: bool = True) float[source]#

Get the last mean reward for an episode for a given reward name. The mean reward is only calculated when episodes end/reset.

Parameters:
  • name – The name of the reward to get the mean for.

  • before_weight – If True, this will be the base reward value before the weight was applied.

Returns:

The last mean reward for an episode for a given reward name.

reset(envs_idx: list[int] | None = None)[source]#

Log the reward mean values at the end of the episode

step() torch.Tensor[source]#

Calculate the rewards for this step

Returns:

The rewards for the environments. Shape is (num_envs,).

property episode_data: dict[str, torch.Tensor]#

Get the accumulated reward data for the current episode of all environments.

property rewards: torch.Tensor#

The rewards calculated for the most recent step. Shape is (num_envs,).