Navground-PettingZoo integration#

This notebook showcases the integration between navground and PettingZoo, the “multi-agent” version of Gymnasium. We focus on the differences compared to with Gymnasium: have a look at the Gymnasium notebook for the common parts (e.g., rendering).

While in Gymnasium we control a single navground agent (which may move among many other agents controlled by navground), with PettingZoo we can control multiple agents, even all the agents of a navground simulation.

To start, we load the same scenario with 20 agents and the same sensor

[1]:

from navground import sim
import numpy as np

from navground import sim

with open('scenario.yaml') as f:
    scenario = sim.load_scenario(f.read())

with open('sensor.yaml') as f:
    sensor = sim.load_state_estimation(f.read())

A single group#

Now, instead of a single agent, we want to control a group of agents with a policy acting on the selected sensor. We define the PettingZoo environment, controlling the first 10 agents, sharing the same configuration

[3]:

from navground.learning.parallel_env import shared_parallel_env
from navground.learning import DefaultObservationConfig, ControlActionConfig
from navground.learning.rewards import SocialReward

observation_config = DefaultObservationConfig(include_target_direction=True,
                                              include_target_distance=True)
action_config = ControlActionConfig()

env = shared_parallel_env(
    scenario=scenario,
    indices=slice(0, 10, 1),
    sensor=sensor,
    action=action_config,
    observation=observation_config,
    reward=SocialReward(),
    time_step=0.1,
    max_duration=60.0)

All agents have the same observation and action spaces has configured

[4]:

print(f'We are controlling {len(env.possible_agents)} agents')

observation_space = env.observation_space(0)
action_space = env.action_space(0)
if all(env.action_space(i) == action_space and env.observation_space(i) == observation_space
       for i in env.possible_agents):
    print(f'They share the same observation {observation_space} and action {action_space} spaces')

We are controlling 10 agents
They share the same observation Dict('position': Box(-5.0, 5.0, (5, 2), float32), 'radius': Box(0.0, 0.1, (5,), float32), 'valid': Box(0, 1, (5,), uint8), 'velocity': Box(-0.12, 0.12, (5, 2), float32), 'ego_target_direction': Box(-1.0, 1.0, (2,), float32), 'ego_target_distance': Box(0.0, inf, (1,), float32)) and action Box(-1.0, 1.0, (2,), float32) spaces

The info map returned by reset(...) and step(...) contains the action computed by original navground behavior, in this case HL, for each of the 10 agents.

[5]:

observations, infos = env.reset()
print(f"Observation #0: {observations[0]}")
print(f"Info #0: {infos[0]}")

Observation #0: {'ego_target_distance': array([1.3484901], dtype=float32), 'ego_target_direction': array([ 1.0000000e+00, -1.5725663e-08], dtype=float32), 'position': array([[-0.00738173, -0.30817246],
       [-0.38925827,  0.01894906],
       [-0.46368217, -0.4778133 ],
       [ 0.15306982, -0.6674728 ],
       [ 0.5088892 , -0.62434775]], dtype=float32), 'radius': array([0.1, 0.1, 0.1, 0.1, 0.1], dtype=float32), 'valid': array([1, 1, 1, 1, 1], dtype=uint8), 'velocity': array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)}
Info #0: {'navground_action': array([0.32967997, 0.        ], dtype=float32)}

Let’s collect the reward from the original controller

[6]:

all_rewards = []
for n in range(1000):
    actions = {i: info['navground_action'] for i, info in infos.items()}
    observations, rewards, terminated, truncated, infos = env.step(actions)
    all_rewards.append(np.mean(list(rewards.values())))
    done = np.bitwise_or(list(terminated.values()), list(truncated.values()))
    if np.all(done):
        print(f'reset after {n} steps')
        observations, infos = env.reset()

print(f'mean reward {np.mean(all_rewards):.3f}')

reset after 600 steps
mean reward -0.243

and compare it with the reward from a random policy

[7]:

observations, infos = env.reset()
all_rewards = []
for n in range(1000):
    actions = {i: env.action_space(i).sample() for i in range(10)}
    observations, rewards, terminated, truncated, infos = env.step(actions)
    all_rewards.append(np.mean(list(rewards.values())))
    done = np.bitwise_or(list(terminated.values()), list(truncated.values()))
    if np.all(done):
        print(f'reset after {n} steps')
        observations, infos = env.reset()

print(f'mean reward {np.mean(all_rewards):.3f}')

reset after 600 steps
mean reward -1.117

We want to use a machine learning policy to generate to action. For instance, a random policy, like

[8]:

from navground.learning.policies.random_predictor import RandomPredictor

policies = {i: RandomPredictor(observation_space=env.observation_space(i),
                               action_space=env.action_space(i))
            for i in env.agents}

Policies output a tuple (action, state). Therefore the new loop is

[9]:

observations, infos = env.reset()
rewards = []
for n in range(1000):
    actions = {i: policies[i].predict(observations[i])[0] for i in env.agents}
    observations, rewards, terminated, truncated, infos = env.step(actions)
    all_rewards.append(np.mean(list(rewards.values())))
    done = np.bitwise_or(list(terminated.values()), list(truncated.values()))
    if np.all(done):
        print(f'reset after {n} steps')
        observations, infos = env.reset()

print(f'mean reward {np.mean(all_rewards):.3f}')

reset after 600 steps
mean reward -1.088

Two groups#

Let us now consider the more complex case where we want to control agents using different sensors and/or configurations. For instance, we want to control the first 10 agents like before and the second 10 agents using a lidar scanner. Let say we also want to control the second group in acceleration vs the first group in speed.

[19]:

lidar = sim.load_state_estimation("""
type: Lidar
resolution: 100
range: 5.0
""")

[20]:

from navground.learning.parallel_env import parallel_env
from navground.learning import GroupConfig

first_group = GroupConfig(indices=slice(0, 10, 1), sensor=sensor,
                          observation = DefaultObservationConfig(include_target_distance=False),
                          action = ControlActionConfig(),
                          reward=SocialReward())
second_group = GroupConfig(indices=slice(10, 20, 1), sensor=lidar,
                           observation = DefaultObservationConfig(),
                           action = ControlActionConfig(use_acceleration_action=True,
                                                        max_acceleration=1.0,
                                                        max_angular_acceleration=10.0),
                           reward=SocialReward())

env = parallel_env(scenario=scenario, groups=[first_group, second_group],
                   time_step=0.1, max_duration=60.0)

The two groups uses now different observation spaces

[21]:

env.observation_space(0)

[21]:

Dict('position': Box(-5.0, 5.0, (5, 2), float32), 'radius': Box(0.0, 0.1, (5,), float32), 'valid': Box(0, 1, (5,), uint8), 'velocity': Box(-0.12, 0.12, (5, 2), float32))

[22]:

env.observation_space(10)

[22]:

Dict('fov': Box(0.0, 6.2831855, (1,), float32), 'range': Box(0.0, 5.0, (100,), float32), 'start_angle': Box(-6.2831855, 6.2831855, (1,), float32))

and differnet maps between actions and commands

[23]:

env._possible_agents[0].gym.get_cmd_from_action(np.ones(2), time_step=0.1)

[23]:

Twist2((0.120000, 0.000000), 2.553191, frame=Frame.relative)

[24]:

env._possible_agents[10].gym.get_cmd_from_action(np.ones(2), time_step=0.1)

[24]:

Twist2((0.100000, 0.000000), 1.000000, frame=Frame.relative)

Convert to a Gymnasium Env#

In case the agents share the same configuration (and in particular action and observation spaces), we can convert the PettingZoo env in a Gymnasium vector env.

[25]:

env = shared_parallel_env(
    scenario=scenario,
    agent_indices=slice(0, 10, 1),
    sensor=sensor,
    action=action_config,
    observation=observation_config,
    reward=SocialReward(),
    time_step=0.1,
    max_duration=60.0)

[26]:

import supersuit

venv = supersuit.pettingzoo_env_to_vec_env_v1(env)

with

[27]:

venv.num_envs

[27]:

environments that represents the individual agents.

This vector env follows the Gymnasium API, stacking together observation, actions of the individual agents

If we want instead an vector env to follows the SB3 API, we can use (even stacking multiple vectorized envs together)

[28]:

venv1 = supersuit.concat_vec_envs_v1(venv, 2, num_cpus=1, base_class="stable_baselines3")

[29]:

venv1.num_envs

[29]:

Convert from a Gymnasium Env#

If we have a single agent navground enviroment that uses a multi-agent scenario, we can convert it to a parallel environment, where all controlled agents share the same configuration, like for shared_parallel_env.

Let us load the environment we saved in the previous notebook

[11]:

from navground.learning import io

sa_env = io.load_env('env.yaml')

and covert it to a parallel environment, controlling 10 (out of the total 20) agents.

[15]:

from navground.learning.parallel_env import make_shared_parallel_env_with_env

[16]:

env1 = make_shared_parallel_env_with_env(env=sa_env, indices=slice(0, 10))

[17]:

env1.possible_agents

[17]:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Saving and loading#

The multi-agent PettingZoo environment supports the same YAML representation like the single-agent Gymnasium environment and we can save it and load it from a YAML file.

[18]:

io.save_env(env1, 'penv.yaml')

Let us check that the groups field is coherent with the configuration we have just provided: a single group of 10 agents (indices 0, 1, …, 9).

[22]:

import yaml

print(yaml.safe_dump(env1.asdict['groups']))

- action:
    dof: null
    dtype: ''
    fix_orientation: false
    has_wheels: null
    max_acceleration: .inf
    max_angular_acceleration: .inf
    max_angular_speed: .inf
    max_speed: .inf
    type: Control
    use_acceleration_action: false
    use_wheels: false
  indices:
    start: 0
    step: null
    stop: 10
    type: slice
  observation:
    dof: null
    dtype: ''
    flat: false
    history: 1
    include_angular_speed: false
    include_radius: false
    include_target_angular_speed: false
    include_target_direction: true
    include_target_direction_validity: false
    include_target_distance: true
    include_target_distance_validity: false
    include_target_speed: false
    include_velocity: false
    max_angular_speed: .inf
    max_radius: .inf
    max_speed: .inf
    max_target_distance: .inf
    type: Default
  reward:
    alpha: 0.0
    beta: 1.0
    critical_safety_margin: 0.0
    default_social_margin: 0.0
    safety_margin: null
    social_margins: {}
    type: Social
  sensor: {}

[ ]:

Navground-PettingZoo integration

Contents