Single-agent Gymnasium Environment#

navground.learning.env

The environment base class

This class describes an environment that uses a navground.sim.Scenario to generate and then simulate a navground.sim.World.

Actions and observations relates to a single selected individual navground.sim.Agent.

The behavior is registered under the id "navground":

>>> import gymnasium as gym
>>> import navground.learning.env
>>> from navground import sim
>>>
>>> scenario = sim.load_scenario(...)
>>> env = gym.make("navground", scenario=scenario)
Parameters:
  • scenario (sim.Scenario | str | dict[str, Any] | None) – The scenario to initialize all simulated worlds. If a str, it will be interpreted as the YAML representation of a scenario. If a dict, it will be dumped to YAML and then treated as a str.

  • agent_index (int) – The world index of the selected agent, must be smaller than the number of agents.

  • sensor (sim.Sensor | str | dict[str, Any] | None) – A sensor to produce observations for the selected agents. If a str, it will be interpreted as the YAML representation of a sensor. If a dict, it will be dumped to YAML and then treated as a str. If None, it will use the agents’ own state estimation, if a sensor.

  • action (ActionConfig) – The configuration of the action space to use.

  • observation (ObservationConfig) – The configuration of the observation space to use.

  • reward (Reward | None) – The reward function to use. If none, it will default to constant zeros.

  • time_step (float) – The simulation time step applied at every step().

  • max_duration (float) – If positive, it will signal a truncation after this simulated time.

  • terminate_outside_bounds (bool) – Whether to terminate when an agent exit the bounds

  • bounds (Bounds | None) – The area to render and a fence for truncating processes when agents exit it.

  • render_mode (str | None) – The render mode. If “human”, it renders a simulation in real time via websockets (see navground.sim.ui.WebUI). If “rgb_array”, it uses navground.sim.ui.render.image_for_world() to render the world on demand.

  • render_kwargs (Mapping[str, Any]) – Arguments passed to navground.sim.ui.render.image_for_world()

  • realtime_factor (float) – a realtime factor for render_mode=”human”: larger values speed up the simulation.

  • stuck_timeout (float) – The time to wait before considering an agent stuck.

  • color (str) – An optional color of the agent (only used for displaying)

  • tag (str) – An optional tag to be added to the agent (only used as metadata)

Load the class from the JSON representation

Parameters:

value (Mapping[str, Any]) – A JSON-able dict

Returns:

An instance of the class

Return type:

Self

Conforms to gymnasium.Env.reset().

It samples a new world from a scenario, runs one dry simulation step using navground.sim.World.update_dry(). Then, it converts the agent’s state to observations. and the command it would actuate to an action, which it includes in the info dictionary at key “navground_action”`.

Parameters:
Return type:

tuple[Observation, dict[str, Any]]

Conforms to gymnasium.Env.step().

It converts the action to a command that the navground agent actuates. Then, it updates the world for one step, calling navground.sim.World.update(). Finally, it converts the agent’s state to observations, the command it would actuate to an action, which it includes in the info dictionary at key “navground_action”, and computes a reward.

Termination is set when the agent completes the task, exits the boundary, or gets stuck. Truncation is set when the maximal duration has passed.

Parameters:

action (Action)

Return type:

tuple[Observation, float, bool, bool, dict[str, Action]]

A JSON-able representation of the instance

Returns:

A JSON-able dict

Returns the arguments used to initialize the environment

Returns:

The initialization arguments.

A policy that returns the action computed by the navground agent.

Returns:

The policy.

The navground scenario, if set.