Reward Manager#

The Reward Manager handles computing, combining, and logging reward components in your RL environment. It provides a clean way to define multi-objective rewards with automatic tracking and tensorboard logging.

You can see a full example using the reward manager in examples/simple.

Overview#

The Reward Manager allows you to:

  • Define multiple reward components with individual weights

  • Automatically sum rewards and track individual contributions

  • Log rewards to tensorboard for analysis

  • Dynamically adjust rewards during training (curriculum learning)

  • Reuse common reward functions from the MDP library

Basic Usage#

from genesis_forge.managers import RewardManager
from genesis_forge.mdp import rewards

class MyEnv(ManagedEnvironment):
    def config(self):
        RewardManager(
            self,
            cfg={
                "height": {
                    "weight": -1.0,            # Weight/scale
                    "fn": rewards.base_height, # Reward function
                    "params": {                # Params to the reward function
                        "target_height": 0.3
                    }
                },
                "flat_orientation": {
                    "fn": rewards.flat_orientation_l2,
                    "weight": -1.0,
                },
            },
        )

Reward Configuration#

Each reward config item requires:

  • fn: A function that computes the reward

  • weight: Multiplier for this component (can be negative for penalties)

  • params (optional): Additional parameters to pass to the function

RewardManager(
    self,
    cfg={
        "height_tracking": {
            "weight": -10.0,  # Strong penalty for wrong height
            "fn": rewards.base_height,
            "params": {
                "target_height": 0.35,  # Pass target to function
            },
        },
    },
)

Built-in Reward Functions#

Genesis Forge provides many common reward functions in genesis_forge.mdp.rewards:

Custom Reward Functions#

A custom reward function takes in the environment as the first parameter, as well as any other parameter which will be defined in the params dict at the RewardManager. The returned value should be a tensor (shape: (num_envs,)) with a float value for each environment.

Simple Custom Rewards#

def my_custom_reward(env):
    """Reward for staying near origin."""
    distance = torch.norm(env.robot.get_pos()[:, :2], dim=1)
    return torch.exp(-distance)

RewardManager(
    self,
    cfg={
        "stay_centered": {
            "fn": my_custom_reward,
            "weight": 0.5,
        },
    },
)

Rewards with Parameters#

def target_height_reward(env, target_height: float):
    """Reward for reaching a target height."""
    base_pos = robot.get_pos()
    return torch.square(base_pos[:, 2] - target_height)

RewardManager(
    self,
    cfg={
        "height": {
            "weight": -5.0,
            "fn": target_height_reward,
            "params": {
                "target_height": 0.3
            },
        },
    },
)

Lambda Functions#

For simple one-liners, use lambda functions:

RewardManager(
    self,
    cfg={
        # Penalize high angular velocity
        "spin_penalty": {
            "fn": lambda env: torch.abs(env.robot.get_ang_vel()[:, 2]),
            "weight": -0.2,
        },
    },
)

Dynamic Reward Adjustment#

Curriculum Learning#

Adjust rewards based on training progress:

class MyEnv(ManagedEnvironment):
    def config(self):
        self.reward_manager = RewardManager(self, cfg={
            "forward_vel": {
                "weight": 1.0,
                "fn": ...,
            },
            "upright": {
                "weight": -1.5,
                "fn": ...,
            },
            "energy": {
                "weight": 0.0,
                "fn": ..,
            },
        })

    def step(self):
        self.update_curriculum()
        return super().step(actions)

    def update_curriculum(self):
        """Called periodically during training."""
        if self.step_count === 200:
            # Mid training: increase speed focus
            self.reward_manager.cfg["upright"].weight = -2.0
            self.reward_manager.cfg["forward_vel"].weight = 2.0
        elif self.step_count === 500:
            # Late training: add efficiency
            self.reward_manager.cfg["upright"].weight = -1.0
            self.reward_manager.cfg["forward_vel"].weight = 3.0
            self.reward_manager.cfg["energy"].weight = -0.01

Logging and Analysis#

By default, individual reward components are logged to the episode item in the extras/infos dict. For many RL frameworks, like rsl_rl and skrl, items there will automatically be logged to tensorboard, or simular system. Rewards will be placed under the “Rewards” section.

tensor board

Example tensorboard reward logging#

To disable logging, set logging_enabled to False. To change the extras dict key that reward items are logged to, set the extras_logging_key param on the environment.