Multi-agent Pettingzoo Environment#

navground.learning.parallel_env

The environment base class

Bases: NavgroundBaseEnv, ParallelEnv[int, dict[str, ndarray[Any, dtype[Any]]] | ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

This class describes an environment that uses a navground.sim.Scenario to generate and then simulate a navground.sim.World.

Actions and observations relates to one or more selected navground.sim.Agent.

We provide convenience functions to initialize the class parallel_env() and shared_parallel_env() to create the environment.

Parameters:
  • args (Any)

  • kwargs (Any)

Gets the action configuration for a (possible) agent

Parameters:

index (int) – The agent index

Returns:

The action configuration, or None if undefined.

Return type:

ActionConfig | None

Load the class from the JSON representation

Parameters:

value (Mapping[str, Any]) – A JSON-able dict

Returns:

An instance of the class

Return type:

Self

Convert action to navground command for a (possible) agent

Parameters:
  • index (int) – The agent index

  • action (Action) – The action

  • time_step (float) – The time step

Returns:

A control command or None if no agent is configured at the given index.

Return type:

Twist2 | None

A policy that returns the action computed by the navground agent.

Returns:

The policy.

Parameters:

index (int)

Return type:

InfoPolicy

Gets the observation configuration for a (possible) agent

Parameters:

index (int) – The agent index

Returns:

The observation configuration, or None if undefined.

Return type:

ObservationConfig | None

Conforms to pettingzoo.utils.env.ParallelEnv.reset().

It samples a new world from a scenario, runs one dry simulation step using navground.sim.World.update_dry(). Then, it converts the agents states to observations. and the commands they would actuate to actions, which it includes in the info dictionary at key “navground_action”`.

Parameters:
Return type:

tuple[dict[int, dict[str, ndarray[Any, dtype[Any]]] | ndarray[Any, dtype[Any]]], dict[int, dict[str, ndarray[Any, dtype[Any]]]]]

Conforms to pettingzoo.utils.env.ParallelEnv.step().

It converts the actions to commands that the navground agents actuate. Then, it updates the world for one step, calling navground.sim.World.update(). Finally, it converts the agents states to observations, the commands they would actuate to actions, which it includes in the info dictionary at key “navground_action”, and computes rewards.

Termination for individual agents is set when they complete the task, exit the boundary, or get stuck. Truncation for all agents is set when the maximal duration has passed.

Parameters:

actions (dict[int, Action])

Return type:

tuple[dict[int, dict[str, ndarray[Any, dtype[Any]]] | ndarray[Any, dtype[Any]]], dict[int, float], dict[int, bool], dict[int, bool], dict[int, dict[str, ndarray[Any, dtype[Any]]]]]

A JSON-able representation of the instance

Returns:

A JSON-able dict

Returns the arguments used to initialize the environment

Returns:

The initialization arguments.

The navground scenario, if set.

Create a multi-agent PettingZoo environment that uses a navground.sim.Scenario to generate and then simulate a navground.sim.World. All controlled agents share the same configuration.

>>> from navground.learning.parallel_env import shared_parallel_env
>>> from navground import sim
>>>
>>> scenario = sim.load_scenario(...)
>>> env = shared_parallel_env(scenario=scenario, ...)
Parameters:
  • scenario (sim.Scenario | str | dict[str, Any] | None) – The scenario to initialize all simulated worlds. If a str, it will be interpreted as the YAML representation of a scenario. If a dict, it will be dumped to YAML and then treated as a str.

  • max_number_of_agents (int | None) – The maximal number of agents that we will expose. It needs to be specified only for scenarios that generate world with a variable number of agents.

  • indices (IndicesLike) – The world indices of the agent to control. All other agents are controlled solely by navground.

  • sensor (SensorLike | None) – An optional sensor that will be added to sensors.

  • sensors (SensorSequenceLike) – A sequence of sensor to generate observations for the agents or its YAML representation. If Items of class str will be interpreted as the YAML representation of a sensor. Items of class dict will be dumped to YAML and then treated as a str. If empty, it will use the agents’ own sensors.

  • action (ActionConfig) – The configuration of the action space to use.

  • observation (ObservationConfig) – The configuration of the observation space to use.

  • reward (Reward | None) – The reward function to use. If none, it will default to constant zeros.

  • time_step (float) – The simulation time step applied at every MultiAgentNavgroundEnv.step().

  • max_duration (float) – If positive, it will signal a truncation after this simulated time.

  • terminate_if_idle (bool) – Whether to terminate when an agent is idle

  • truncate_outside_bounds (bool) – Whether to truncate when an agent exit the bounds

  • bounds (Bounds | None) – The area to render and a fence for truncating processes when agents exit it.

  • render_mode (str | None) – The render mode. If “human”, it renders a simulation in real time via websockets (see navground.sim.ui.WebUI). If “rgb_array”, it uses navground.sim.ui.render.image_for_world() to render the world on demand.

  • render_kwargs (Mapping[str, Any]) – Arguments passed to navground.sim.ui.render.image_for_world()

  • realtime_factor (float) – a realtime factor for render_mode=”human”: larger values speed up the simulation.

  • stuck_timeout (float) – The time to wait before considering an agent stuck.

  • color (str) – An optional color of the agents (only used for displaying)

  • tag (str) – An optional tag to be added to the agents (only used as metadata)

  • wait (bool) – Whether to signal termination/truncation only when all agents have terminated/truncated.

  • truncate_fast (bool) – Whether to signal truncation for all agents as soon as one agent truncates.

  • state (StateConfig | None) – An optional global state configuration

  • terminate_on_success (bool) – Whether to terminate the episode on success.

  • terminate_on_failure (bool) – Whether to terminate the episode on failure.

  • success_condition (TerminationCondition | None) – An optional success criteria

  • failure_condition (TerminationCondition | None) – An optional failure criteria

  • include_action (bool) – Whether to include field “navground_action” in the info

  • include_success (bool) – Whether to include field “is_success” in the info

  • init_success (bool | None) – The default value of success (valid until a termination condition is met)

  • intermediate_success (bool) – Whether to include “is_success” in the info at intermediate steps (vs only at termination).

Returns:

The multi agent navground environment.

Return type:

MultiAgentNavgroundEnv

Create a multi-agent PettingZoo environment that uses a navground.sim.Scenario to generate and then simulate a navground.sim.World.

>>> from navground.learning.parallel_env import parallel_env
>>> from navground.learning import GroupConfig
>>> from navground import sim
>>>
>>> scenario = sim.load_scenario(...)
>>> groups = [GroupConfig(...), ...]
>>> env = parallel_env(scenario=scenario, groups=groups)
Parameters:
  • scenario (sim.Scenario | str | dict[str, Any] | None) – The scenario to initialize all simulated worlds. If a str, it will be interpreted as the YAML representation of a scenario. If a dict, it will be dumped to YAML and then treated as a str.

  • groups (Collection[GroupConfig]) – The configuration of the agents controlled by the environment. All other agents are controlled solely by navground.

  • max_number_of_agents (int | None) – The maximal number of agents that we will expose. It needs to be specified only for scenarios that generate world with a variable number of agents.

  • time_step (float) – The simulation time step applied at every MultiAgentNavgroundEnv.step().

  • max_duration (float) – If positive, it will signal a truncation after this simulated time.

  • terminate_if_idle (bool) – Whether to terminate when an agent is idle

  • truncate_outside_bounds (bool) – Whether to truncate when an agent exit the bounds

  • bounds (Bounds | None) – The area to render and a fence for truncating processes when agents exit it.

  • render_mode (str | None) – The render mode. If “human”, it renders a simulation in real time via websockets (see navground.sim.ui.WebUI). If “rgb_array”, it uses navground.sim.ui.render.image_for_world() to render the world on demand.

  • render_kwargs (Mapping[str, Any]) – Arguments passed to navground.sim.ui.render.image_for_world()

  • realtime_factor (float) – a realtime factor for render_mode=”human”: larger values speed up the simulation.

  • stuck_timeout (float) – The time to wait before considering an agent stuck.

  • wait (bool) – Whether to signal termination/truncation only when all agents have terminated/truncated.

  • truncate_fast (bool) – Whether to signal truncation for all agents as soon as one agent truncates.

  • state (StateConfig | None) – An optional global state configuration

  • include_action (bool) – Whether to include field “navground_action” in the info

  • include_success (bool) – Whether to include field “is_success” in the info

  • init_success (bool | None) – The default value of success (valid until a termination condition is met)

  • intermediate_success (bool) – Whether to include “is_success” in the info at intermediate steps (vs only at termination).

Returns:

The multi agent navground environment.

Return type:

MultiAgentNavgroundEnv

Converts a PettingZoo  Parallel environment to a StableBaseline3 vectorized environment

The multiple agents of the single PettingZoo (action/observation spaces needs to be shared) are stuck as multiple environments of a single agent using supersuit.pettingzoo_env_to_vec_env_v1(), and then concatenated using supersuit.concat_vec_envs_v1().

Parameters:
  • env (BaseParallelEnv) – The environment

  • num_envs (int) – The number of parallel envs to concatenate

  • processes (int) – The number of processes

  • seed (int) – The seed

  • black_death (bool) – Whether to allow dynamic number of agents

  • monitor (bool) – Whether to wrap the vector env in a VecMonitor

  • monitor_keywords (tuple[str]) – The keywords passed to VecMonitor

Returns:

The vector environment with number x |agents| single agent environments.

Return type:

VecEnv

Creates a shared parallel environment using the configuration of the agent exposed in a (navground) single-agent environment.

Parameters:
  • env (BaseEnv) – The environment.

  • max_number_of_agents (int | None) – The maximal number of agents that we will expose. It needs to be specified only for scenarios that generate world with a variable number of agents.

  • indices (Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL']) – The world indices of the agent to control. All other agents are controlled solely by navground.

Raises:

TypeError – If env.unwrapped is not a subclass of env.NavgroundEnv

Return type:

MultiAgentNavgroundEnv

Bases: Env

Wraps a multi-agent parallel environment as a single agent environment, stacking observation but aggregating rewards, terminations and truncations.

Requires that homogeneous observations spaces (if state=False) and action spaces.

Parameters:
  • env (ParallelEnv) – The parallel environment

  • state (bool) – Whether to return the global state as observations (vs the stacked observation).