TorchRL

TorchRL#

This notebook showcases the integration between navground and TorchRL.

We load the two-group environement that we created in the PettingZoo notebook.

[1]:
import warnings
from navground.learning import io

warnings.filterwarnings("ignore")
penv2 = io.load_env('penv2.yaml')

Because TorchRL natevely supports PettingZoo, we can convert any navground paralell environment to a TorchRL environment.

[2]:
from navground.learning.utils.benchmarl import make_env

env = make_env(penv2, seed=0)

The TorchRL environment groups agents using their navground tags, which we set to “first” and “second” in the the PettingZoo notebook, as group name and prefix. The suffix comes from the agents indices in penv:

[3]:
env.group_map
[3]:
{'first': ['first_0',
  'first_1',
  'first_2',
  'first_3',
  'first_4',
  'first_5',
  'first_6',
  'first_7',
  'first_8',
  'first_9'],
 'second': ['second_10',
  'second_11',
  'second_12',
  'second_13',
  'second_14',
  'second_15',
  'second_16',
  'second_17',
  'second_18',
  'second_19']}

Rolling out a random policy

[4]:
rs = env.rollout(max_steps=100)

records different sensors for the two groups (each composed by 10 agents)

[5]:
rs['first']['observation']
[5]:
TensorDict(
    fields={
        ego_target_direction: Tensor(shape=torch.Size([100, 10, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        neighbors/position: Tensor(shape=torch.Size([100, 10, 5, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        neighbors/radius: Tensor(shape=torch.Size([100, 10, 5]), device=cpu, dtype=torch.float32, is_shared=False),
        neighbors/valid: Tensor(shape=torch.Size([100, 10, 5]), device=cpu, dtype=torch.uint8, is_shared=False),
        neighbors/velocity: Tensor(shape=torch.Size([100, 10, 5, 2]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([100, 10]),
    device=cpu,
    is_shared=False)
[6]:
rs['second']['observation']
[6]:
TensorDict(
    fields={
        ego_target_direction: Tensor(shape=torch.Size([100, 10, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        lidar/fov: Tensor(shape=torch.Size([100, 10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        lidar/max_range: Tensor(shape=torch.Size([100, 10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        lidar/range: Tensor(shape=torch.Size([100, 10, 100]), device=cpu, dtype=torch.float32, is_shared=False),
        lidar/start_angle: Tensor(shape=torch.Size([100, 10, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([100, 10]),
    device=cpu,
    is_shared=False)

When the parallel environement global state is configured, the rollout also includes too:

[7]:
penv_state = io.load_env('penv_state.yaml')
env = make_env(penv_state)
rs = env.rollout(max_steps=100)
[8]:
rs['state']
[8]:
tensor([[ 1.8187, -0.1003,  1.1902,  ...,  1.5299,  0.7113, -0.1702],
        [ 1.8212, -0.0989,  1.1902,  ...,  1.5318,  0.7081, -0.1776],
        [ 1.8212, -0.0989,  1.1902,  ...,  1.5318,  0.7053, -0.1825],
        ...,
        [ 1.9602, -0.1683,  1.2041,  ...,  1.5325,  0.6386, -0.2083],
        [ 1.9602, -0.1683,  1.2053,  ...,  1.5325,  0.6379, -0.2088],
        [ 1.9612, -0.1723,  1.2060,  ...,  1.5348,  0.6368, -0.2095]])
[ ]: