Utils

Utils#

navground.learning.utils

Plotting#

navground.learning.utils.plot

plot_policy(policy: AnyPolicyPredictor, axs: Iterable[Axes] | None = None, fix: dict[str, float | Iterable[float]] = {}, colors: Iterable | None = None, variable: dict[str, tuple[float, float]] = {}, actions: dict[int, str] = {}, cmap: str = 'RdYlGn', samples: int = 101, width: float = 5, height: float = 3, label: str = '', **kwargs: Any)#

Parameters:

policy (AnyPolicyPredictor)
axs (Iterable[Axes] | None)
fix (dict[str, float | Iterable[float]])
colors (Iterable | None)
variable (dict[str, tuple[float, float]])
actions (dict[int, str])
cmap (str)
samples (int)
width (float)
height (float)
label (str)
kwargs (Any)

class LogField(key, label, low, high, linestyle, color)#

Bases: NamedTuple

Parameters:

key (str)
label (str)
low (float | None)
high (float | None)
linestyle (str)
color (str | None)

plot_logs(logs: DataFrame, key: str, fields: Sequence[LogField], two_axis: bool = False, title: str = '', **kwargs: Any) → None#

Plots logged fields.

Parameters:

logs (DataFrame) – The logs
key (str) – Common x-axis key
fields (Sequence[LogField]) – Which fields to plot.
two_axis (bool) – Whether to use two axis (only if there are two fields)
title (str)
kwargs (Any)

Return type:

None

StableBaseLine3#

navground.learning.utils.sb3

class ExportOnnxCallback(verbose: int = 0)#

Bases: BaseCallback

Exports the (best) model policy as “best_policy.onnx”.

Parameters:: verbose (int)

class ProgressBarWithRewardCallback(every: int = 1)#

Bases: BaseCallback

Similar to SB3’s own stable_baselines3.common.callbacks.ProgressBarCallback, it displays a progress bar when training SB3 agent using tqdm but includes episodes mean reward and length.

Parameters:: every (int)

class VideoCallback(env: BaseEnv | BaseParallelEnv | VecEnv, save_path: PathLike | None = None, number: int = 1, grouped: bool = False, video_format: str = 'mp4', policy: Any = None, **kwargs: Any)#

Bases: BaseCallback

A SB3 callback that makes a video of one or more runs of an environment.

Parameters:

env (BaseEnv | BaseParallelEnv | VecEnv)
save_path (PathLike | None)
number (int)
grouped (bool)
video_format (str)
policy (Any)
kwargs (Any)

load_eval_logs(path: PathLike) → DataFrame#

Loads the evaluation logs from a csv.

Parameters:: path (PathLike) – The directory with csv logs
Returns:: A dataframe with the logs
Return type:: DataFrame

plot_eval_logs(path: PathLike, two_axis: bool = False, reward: bool = True, reward_color: str = 'k', reward_linestyle: str = '--', success: bool = False, length: bool = False, reward_low: float | None = None, reward_high: float = 0, length_low: float = 0, length_high: float | None = None, **kwargs: Any) → None#

Plots reward, success and/or length from the eval logs

Parameters:

path (PathLike) – The directory with csv logs
reward (bool) – Whether to plot the reward
reward_color (str) – The reward color
reward_linestyle (str) – The reward linestyle
success (bool) – Whether to plot the success
length (bool) – Whether to plot the length
reward_low (float | None) – An optional lower bound to scale the reward
reward_high (float) – An optional upper bound to scale the reward
lenght_low – An optional lower bound to scale the reward
lenght_high – An optional upper bound to scale the reward
two_axis (bool) – Whether to use two axis (only if there are two fields)
length_low (float)
length_high (float | None)
kwargs (Any)

Return type:

None

BenchMARL#

navground.learning.utils.benchmarl

class NavgroundExperiment(algorithm_config: AlgorithmConfig, model_config: ModelConfig, seed: int, config: ExperimentConfig, env: MultiAgentNavgroundEnv | None = None, eval_env: MultiAgentNavgroundEnv | None = None, task: NavgroundTaskClass | None = None, critic_model_config: ModelConfig | None = None, callbacks: list[Callback] | None = None)#

Bases: Experiment

A benchmarl.experiment.Experiment created from a navground.learning.parallel_env.MultiAgentNavgroundEnv.

Parameters:

env (MultiAgentNavgroundEnv | None) – The training environment
task (NavgroundTaskClass | None) – The task
algorithm_config (AlgorithmConfig) – The algorithm configuration
model_config (ModelConfig) – The model configuration
seed (int) – The seed
config (ExperimentConfig) – The experiment configuration
critic_model_config (ModelConfig | None) – The critic model configuration
callbacks (list[Callback] | None) – The callbacks
eval_env (MultiAgentNavgroundEnv | None) – The evaluation environment. If not set, it will use the training environment for evaluation too.

action_space(group: str) → gym.Space#

Gets the action space associated with a group

Parameters:: group (str) – The group name
Raises:: ValueError – If the group is empty
Returns:: The action space.
Return type:: gym.Space

evaluate_policy(n_eval_episodes: int = 10, return_episode_rewards: bool = False) → tuple[float, float] | tuple[list[float], list[int]]#

Evaluates the current policy using :py:func:navground.learning.utils.benchmarl.evaluate_policy`.

Parameters:

n_eval_episodes (int) – The number of episodes
return_episode_rewards (bool) – Whether to return individual episode rewards (vs aggregate them)

Returns:

If return_episode_rewards is set, a tuple (list of cumulated episodes rewards’, list of episodes length) else, a tuple (average episodes rewards, std dev of episodes rewards)

Return type:

tuple[float, float] | tuple[list[float], list[int]]

export_policies(path: PathLike | None = None, name: str = 'policy', dynamic_batch_size: bool = True) → list[Path]#

Export the policy to onnx using navground.learning.onnx.export().

Parameters:

path (PathLike | None) – The directoru where to save the policies. If not set, it uses the experiment log folder.
name (str) – The files prefix
dynamic_batch_size (bool) – Whether to expose a dynamic batch size

Returns:

The path of the saved files.

Return type:

list[Path]

export_policy(path: PathLike | None = None, name: str = 'policy', group: str = 'agent', dynamic_batch_size: bool = True) → Path#

Export the policy to onnx using navground.learning.onnx.export().

Parameters:

path (PathLike | None) – Where to save the policy. If not set, it save it in the experiment log folder under <name>_<group>.onnx.
name (str) – The name file (used as alternative to path)
group (str) – The group name
dynamic_batch_size (bool) – Whether to expose a dynamic batch size

Returns:

The path of the saved file.

Return type:

Path

get_indices(group: str) → list[int]#

Gets the indices of agents with name <group>_<index>

Parameters:: group (str) – The group
Returns:: The agent indices
Return type:: list[int]

get_single_agent_policies() → list[GroupConfig]#

Gets the policy associated for each group, single agent (i.e., non-tensordict) policies.

Returns:: A list of configurations with loaded policies and agent indices set
Return type:: list[GroupConfig]

get_single_agent_policy(group: str = 'agent') → SingleAgentPolicy#

Gets the policy associated with a group, as single agent (i.e., non-tensordict) policy.

Parameters:: group (str) – The group name
Raises:: ValueError – If the group is empty
Returns:: The policy.
Return type:: SingleAgentPolicy

load_log() → pd.DataFrame#

Loads data stored in the the logs csv files.

Returns:: A dataframe for all data.
Return type:: pd.DataFrame

load_policies(path: PathLike | None = None, name: str = 'policy') → list[GroupConfig]#

Loads the onnx policy for each group.

Parameters:

path (PathLike | None) – The directory path. If not set, it uses the experiment log folder.
name (str) – The file prefix

Returns:

A list of configurations with loaded policies and agent indices set

Return type:

list[GroupConfig]

load_policy(path: PathLike | None = None, name: str = 'policy', group: str = 'agent') → OnnxPolicy#

Loads an onnx policy.

Parameters:

path (PathLike | None) – The path. If not set, it load <name>_<group>.onnx from the experiment log folder.
name (str) – The name file (used as alternative to path)
group (str) – The name of the group using the policy. If provided it is used to associate an action space to the policy.

Returns:

The loaded policy.

Return type:

OnnxPolicy

observation_space(group: str) → gym.Space#

Gets the observation space associated with a group

Parameters:: group (str) – The group name
Raises:: ValueError – If the group is empty
Returns:: The action space.
Return type:: gym.Space

plot_eval_logs(group: str = 'agent', two_axis: bool = False, reward: bool = True, reward_color: str = 'k', reward_linestyle: str = '--', success: bool = False, length: bool = False, reward_low: float | None = None, reward_high: float = 0, length_low: float = 0, length_high: float | None = None, **kwargs: Any) → None#

Plots reward, success and/or length from the eval logs

Parameters:

group (str) – The name of the group
reward (bool) – Whether to plot the reward
reward_color (str) – The reward color
reward_linestyle (str) – The reward linestyle
success (bool) – Whether to plot the success
length (bool) – Whether to plot the length
reward_low (float | None) – An optional lower bound to scale the reward
reward_high (float) – An optional upper bound to scale the reward
lenght_low – An optional lower bound to scale the reward
lenght_high – An optional upper bound to scale the reward
two_axis (bool) – Whether to use two axis (only if there are two fields)
length_low (float)
length_high (float | None)
kwargs (Any)

Return type:

None

static reload_from_file(restore_file: str) → NavgroundExperiment#

Restores the experiment from the checkpoint file.

This method expects the same folder structure created when an experiment is run. The checkpoint file (restore_file) is in the checkpoints directory and a config.pkl file is present a level above at restore_file/../../config.pkl

Parameters:: restore_file (str) – The checkpoint file (.pt) of the experiment reload.
Returns:: The reloaded experiment.
Return type:: NavgroundExperiment

run_for(iterations: int = 0, steps: int = 0)#

Train the policy for some iterations and/or steps.

Parameters:

iterations (int) – The iterations
steps (int) – The steps

property log_directory: Path#

The logging directory.

Returns:: The directory path

class SingleAgentPolicy(observation_space: Space, action_space: Box, policy: Any)#

Bases: Module

This class conforms to navground.learning.types.PyTorchPolicy and is construction from a TorchRL policy.

Parameters:

observation_space (gym.Space)
action_space (gym.spaces.Box)
policy (Any)

class ExportPolicyCallback(export_all: bool = False)#

Bases: Callback

Export the policy. The best model gets exported as "best_policy.onnx". The others as "policy_<iterations>.onnx".

Parameters:: export_all (bool) – Whether to export all (vs just the best) policies.

class AlternateActorCallback(loss: str, learning_rates: list[tuple[int, float]], exclusive: bool = True, group: str = 'agent')#

Bases: Callback

Parameters:

loss (str)
learning_rates (list[tuple[int, float]])
exclusive (bool)
group (str)

make_env(env: MultiAgentNavgroundEnv, seed: int = 0, categorical_actions: bool = False) → EnvBase#

Parameters:

env (MultiAgentNavgroundEnv)
seed (int)
categorical_actions (bool)

Return type:

EnvBase

evaluate_policy(policy: Any, env: Any, n_eval_episodes: int = 10, deterministic: bool = True, return_episode_rewards: bool = False, warn: bool = True) → tuple[float, float] | tuple[list[float], list[int]]#

Similar interface as StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() but for TorchRL environments (and policies).

Parameters:

policy (Any) – The policy
env (Any) – The environment
n_eval_episodes (int) – The number of episodes
deterministic (bool) – Whether to evaluate the policy deterministic
return_episode_rewards (bool) – Whether to return individual episode rewards (vs aggregate them)
warn (bool) – Whether to emit warnings

Returns:

If return_episode_rewards is set, a tuple (list of cumulated episodes rewards’, list of episodes length) else, a tuple (average episodes rewards, std dev of episodes rewards)

Return type:

tuple[float, float] | tuple[list[float], list[int]]

Utils

Contents

Utils#

Plotting#

StableBaseLine3#

BenchMARL#