Reward#

class genesis_forge.managers.RewardManager(env: GenesisEnv, cfg: dict[str, RewardConfig], logging_enabled: bool = True, logging_tag: str = 'Rewards')[source]#

Bases: BaseManager

Handles calculating and logging the rewards for the environment.

This works with a dictionary configuration of reward handlers. For each dictionary item, a function will be called to calculate a reward value for the environment.

Parameters:

env – The environment to manage the rewards for.
reward_cfg – A dictionary of reward conditions.
logging_enabled – Whether to log the rewards to tensorboard.
logging_tag – The section name used to log the rewards to tensorboard.

Example with ManagedEnvironment:

class MyEnv(ManagedEnvironment):
    def config(self):
        self.reward_manager = RewardManager(
            self,
            cfg={
                "Default pose": {
                    "fn": mdp.rewards.dof_similar_to_default,
                    "weight": -0.1,
                },
                "Base height": {
                    "fn": mdp.rewards.base_height,
                    "params": { "target_height": 0.135 },
                    "weight": -100.0,
                },
            },
        )

Example using the reward manager directly:

class MyEnv(GenesisEnv):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.reward_manager = RewardManager(
            self,
            cfg={
                "Base height": {
                    "fn": mdp.rewards.base_height,
                    "params": { "target_height": 0.135 },
                    "weight": -100.0,
                },
            },
        )

    def build(self):
        super().build()
        self.reward_manager.build()

    def step(self, actions: torch.Tensor):
        super().step(actions)
        rewards = self.reward_manager.step()
        # ... other step logic ...
        return obs, rewards, terminations, timeouts, info

    def reset(self, envs_idx: list[int] | None = None):
        super().reset(envs_idx)
        # ... other reset logic ...
        return obs, info

__contains__(name: str) → bool[source]#: Check if a reward config item exists by name.

__delitem__(name: str)[source]#: Delete a reward config item by name.

__getitem__(name: str) → RewardConfigItem[source]#: Get a reward config item by name.

__iter__() → Iterator[str][source]#: Iterate over the reward config item names.

__len__() → int[source]#: Get the number of reward config items.

__setitem__(name: str, value: RewardConfigItem)[source]#: Set a reward config item by name.

build()[source]#: Build any config item function classes.

last_episode_mean_reward(name: str, before_weight: bool = True) → float[source]#

Get the last mean reward for an episode for a given reward name. The mean reward is only calculated when episodes end/reset.

Parameters:

name – The name of the reward to get the mean for.
before_weight – If True, this will be the base reward value before the weight was applied.

Returns:

The last mean reward for an episode for a given reward name.

reset(envs_idx: list[int] | None = None)[source]#: Log the reward mean values at the end of the episode

step() → torch.Tensor[source]#

Calculate the rewards for this step

Returns:: The rewards for the environments. Shape is (num_envs,).

property episode_data: dict[str, torch.Tensor]#: Get the accumulated reward data for the current episode of all environments.

property rewards: torch.Tensor#: The rewards calculated for the most recent step. Shape is (num_envs,).