# Reward Manager

The Reward Manager handles computing, combining, and logging reward components in your RL environment. It provides a clean way to define multi-objective rewards with automatic tracking and tensorboard logging.

You can see a full example using the reward manager in [examples/simple](https://github.com/jgillick/genesis-forge/tree/main/examples/simple).

## Overview

The Reward Manager allows you to:

- Define multiple reward components with individual weights
- Automatically sum rewards and track individual contributions
- Log rewards to tensorboard for analysis
- Dynamically adjust rewards during training (curriculum learning)
- Reuse common reward functions from the MDP library

## Basic Usage

```python
from genesis_forge.managers import RewardManager
from genesis_forge.mdp import rewards

class MyEnv(ManagedEnvironment):
    def config(self):
        RewardManager(
            self,
            cfg={
                "height": {
                    "weight": -1.0,            # Weight/scale
                    "fn": rewards.base_height, # Reward function
                    "params": {                # Params to the reward function
                        "target_height": 0.3
                    }
                },
                "flat_orientation": {
                    "fn": rewards.flat_orientation_l2,
                    "weight": -1.0,
                },
            },
        )
```

## Reward Configuration

Each reward config item requires:

- **fn**: A function that computes the reward
- **weight**: Multiplier for this component (can be negative for penalties)
- **params** (optional): Additional parameters to pass to the function

```python
RewardManager(
    self,
    cfg={
        "height_tracking": {
            "weight": -10.0,  # Strong penalty for wrong height
            "fn": rewards.base_height,
            "params": {
                "target_height": 0.35,  # Pass target to function
            },
        },
    },
)
```

## Built-in Reward Functions

Genesis Forge provides many common reward functions in [`genesis_forge.mdp.rewards`](../../api/mdp/rewards):

## Custom Reward Functions

A custom reward function takes in the environment as the first parameter, as well as any other parameter which will be defined in the `params` dict at the RewardManager. The returned value should be a tensor (shape: `(num_envs,)`) with a `float` value for each environment.

### Simple Custom Rewards

```python
def my_custom_reward(env):
    """Reward for staying near origin."""
    distance = torch.norm(env.robot.get_pos()[:, :2], dim=1)
    return torch.exp(-distance)

RewardManager(
    self,
    cfg={
        "stay_centered": {
            "fn": my_custom_reward,
            "weight": 0.5,
        },
    },
)
```

### Rewards with Parameters

```python
def target_height_reward(env, target_height: float):
    """Reward for reaching a target height."""
    base_pos = robot.get_pos()
    return torch.square(base_pos[:, 2] - target_height)

RewardManager(
    self,
    cfg={
        "height": {
            "weight": -5.0,
            "fn": target_height_reward,
            "params": {
                "target_height": 0.3
            },
        },
    },
)
```

### Lambda Functions

For simple one-liners, use lambda functions:

```python
RewardManager(
    self,
    cfg={
        # Penalize high angular velocity
        "spin_penalty": {
            "fn": lambda env: torch.abs(env.robot.get_ang_vel()[:, 2]),
            "weight": -0.2,
        },
    },
)
```

## Dynamic Reward Adjustment

### Curriculum Learning

Adjust rewards based on training progress:

```python
class MyEnv(ManagedEnvironment):
    def config(self):
        self.reward_manager = RewardManager(self, cfg={
            "forward_vel": {
                "weight": 1.0,
                "fn": ...,
            },
            "upright": {
                "weight": -1.5,
                "fn": ...,
            },
            "energy": {
                "weight": 0.0,
                "fn": ..,
            },
        })

    def step(self):
        self.update_curriculum()
        return super().step(actions)

    def update_curriculum(self):
        """Called periodically during training."""
        if self.step_count === 200:
            # Mid training: increase speed focus
            self.reward_manager.cfg["upright"].weight = -2.0
            self.reward_manager.cfg["forward_vel"].weight = 2.0
        elif self.step_count === 500:
            # Late training: add efficiency
            self.reward_manager.cfg["upright"].weight = -1.0
            self.reward_manager.cfg["forward_vel"].weight = 3.0
            self.reward_manager.cfg["energy"].weight = -0.01
```

## Logging and Analysis

By default, individual reward components are logged to the `episode` item in the extras/infos dict. For many RL frameworks, like rsl_rl and skrl, items there will automatically be logged to tensorboard, or simular system. Rewards will be placed under the "Rewards" section.

```{figure} _images/reward_tensorboard.png
:alt: tensor board
Example tensorboard reward logging
```

To disable logging, set `logging_enabled` to `False`. To change the extras dict key that reward items are logged to, set the `extras_logging_key` param on the [environment](../../api/environments/genesis.md).