Utils#

navground.learning.utils

Plotting#

navground.learning.utils.plot

Parameters:

Bases: NamedTuple

Parameters:

Plots logged fields.

Parameters:
  • logs (DataFrame) – The logs

  • key (str) – Common x-axis key

  • fields (Sequence[LogField]) – Which fields to plot.

  • two_axis (bool) – Whether to use two axis (only if there are two fields)

  • title (str)

  • kwargs (Any)

Return type:

None

StableBaseLine3#

navground.learning.utils.sb3

Bases: BaseCallback

Exports the (best) model policy as “best_policy.onnx”.

Parameters:

verbose (int)

Bases: BaseCallback

Similar to SB3’s own stable_baselines3.common.callbacks.ProgressBarCallback, it displays a progress bar when training SB3 agent using tqdm but includes episodes mean reward and length.

Parameters:

every (int)

Bases: BaseCallback

A SB3 callback that makes a video of one or more runs of an environment.

Parameters:

Loads the evaluation logs from a csv.

Parameters:

path (PathLike) – The directory with csv logs

Returns:

A dataframe with the logs

Return type:

DataFrame

Plots reward, success and/or length from the eval logs

Parameters:
  • path (PathLike) – The directory with csv logs

  • reward (bool) – Whether to plot the reward

  • reward_color (str) – The reward color

  • reward_linestyle (str) – The reward linestyle

  • success (bool) – Whether to plot the success

  • length (bool) – Whether to plot the length

  • reward_low (float | None) – An optional lower bound to scale the reward

  • reward_high (float) – An optional upper bound to scale the reward

  • lenght_low – An optional lower bound to scale the reward

  • lenght_high – An optional upper bound to scale the reward

  • two_axis (bool) – Whether to use two axis (only if there are two fields)

  • length_low (float)

  • length_high (float | None)

  • kwargs (Any)

Return type:

None

BenchMARL#

navground.learning.utils.benchmarl

Bases: Experiment

A benchmarl.experiment.Experiment created from a navground.learning.parallel_env.MultiAgentNavgroundEnv.

Parameters:
  • env (MultiAgentNavgroundEnv | None) – The training environment

  • task (NavgroundTaskClass | None) – The task

  • algorithm_config (AlgorithmConfig) – The algorithm configuration

  • model_config (ModelConfig) – The model configuration

  • seed (int) – The seed

  • config (ExperimentConfig) – The experiment configuration

  • critic_model_config (ModelConfig | None) – The critic model configuration

  • callbacks (list[Callback] | None) – The callbacks

  • eval_env (MultiAgentNavgroundEnv | None) – The evaluation environment. If not set, it will use the training environment for evaluation too.

Gets the action space associated with a group

Parameters:

group (str) – The group name

Raises:

ValueError – If the group is empty

Returns:

The action space.

Return type:

gym.Space

Evaluates the current policy using :py:func:navground.learning.utils.benchmarl.evaluate_policy`.

Parameters:
  • n_eval_episodes (int) – The number of episodes

  • return_episode_rewards (bool) – Whether to return individual episode rewards (vs aggregate them)

Returns:

If return_episode_rewards is set, a tuple (list of cumulated episodes rewards’, list of episodes length) else, a tuple (average episodes rewards, std dev of episodes rewards)

Return type:

tuple[float, float] | tuple[list[float], list[int]]

Export the policy to onnx using navground.learning.onnx.export().

Parameters:
  • path (PathLike | None) – The directoru where to save the policies. If not set, it uses the experiment log folder.

  • name (str) – The files prefix

  • dynamic_batch_size (bool) – Whether to expose a dynamic batch size

Returns:

The path of the saved files.

Return type:

list[Path]

Export the policy to onnx using navground.learning.onnx.export().

Parameters:
  • path (PathLike | None) – Where to save the policy. If not set, it save it in the experiment log folder under <name>_<group>.onnx.

  • name (str) – The name file (used as alternative to path)

  • group (str) – The group name

  • dynamic_batch_size (bool) – Whether to expose a dynamic batch size

Returns:

The path of the saved file.

Return type:

Path

Gets the indices of agents with name <group>_<index>

Parameters:

group (str) – The group

Returns:

The agent indices

Return type:

list[int]

Gets the policy associated for each group, single agent (i.e., non-tensordict) policies.

Returns:

A list of configurations with loaded policies and agent indices set

Return type:

list[GroupConfig]

Gets the policy associated with a group, as single agent (i.e., non-tensordict) policy.

Parameters:

group (str) – The group name

Raises:

ValueError – If the group is empty

Returns:

The policy.

Return type:

SingleAgentPolicy

Loads data stored in the the logs csv files.

Returns:

A dataframe for all data.

Return type:

pd.DataFrame

Loads the onnx policy for each group.

Parameters:
  • path (PathLike | None) – The directory path. If not set, it uses the experiment log folder.

  • name (str) – The file prefix

Returns:

A list of configurations with loaded policies and agent indices set

Return type:

list[GroupConfig]

Loads an onnx policy.

Parameters:
  • path (PathLike | None) – The path. If not set, it load <name>_<group>.onnx from the experiment log folder.

  • name (str) – The name file (used as alternative to path)

  • group (str) – The name of the group using the policy. If provided it is used to associate an action space to the policy.

Returns:

The loaded policy.

Return type:

OnnxPolicy

Gets the observation space associated with a group

Parameters:

group (str) – The group name

Raises:

ValueError – If the group is empty

Returns:

The action space.

Return type:

gym.Space

Plots reward, success and/or length from the eval logs

Parameters:
  • group (str) – The name of the group

  • reward (bool) – Whether to plot the reward

  • reward_color (str) – The reward color

  • reward_linestyle (str) – The reward linestyle

  • success (bool) – Whether to plot the success

  • length (bool) – Whether to plot the length

  • reward_low (float | None) – An optional lower bound to scale the reward

  • reward_high (float) – An optional upper bound to scale the reward

  • lenght_low – An optional lower bound to scale the reward

  • lenght_high – An optional upper bound to scale the reward

  • two_axis (bool) – Whether to use two axis (only if there are two fields)

  • length_low (float)

  • length_high (float | None)

  • kwargs (Any)

Return type:

None

Restores the experiment from the checkpoint file.

This method expects the same folder structure created when an experiment is run. The checkpoint file (restore_file) is in the checkpoints directory and a config.pkl file is present a level above at restore_file/../../config.pkl

Parameters:

restore_file (str) – The checkpoint file (.pt) of the experiment reload.

Returns:

The reloaded experiment.

Return type:

NavgroundExperiment

Train the policy for some iterations and/or steps.

Parameters:
  • iterations (int) – The iterations

  • steps (int) – The steps

The logging directory.

Returns:

The directory path

Bases: Module

This class conforms to navground.learning.types.PyTorchPolicy and is construction from a TorchRL policy.

Parameters:

Bases: Callback

Export the policy. The best model gets exported as "best_policy.onnx". The others as "policy_<iterations>.onnx".

Parameters:

export_all (bool) – Whether to export all (vs just the best) policies.

Bases: Callback

Parameters:
Parameters:
Return type:

EnvBase

Similar interface as StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() but for TorchRL environments (and policies).

Parameters:
  • policy (Any) – The policy

  • env (Any) – The environment

  • n_eval_episodes (int) – The number of episodes

  • deterministic (bool) – Whether to evaluate the policy deterministic

  • return_episode_rewards (bool) – Whether to return individual episode rewards (vs aggregate them)

  • warn (bool) – Whether to emit warnings

Returns:

If return_episode_rewards is set, a tuple (list of cumulated episodes rewards’, list of episodes length) else, a tuple (average episodes rewards, std dev of episodes rewards)

Return type:

tuple[float, float] | tuple[list[float], list[int]]