Evaluation#

navground.learning.evaluation

Policy#

Extends StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() to also accept navground.learning.types.PolicyPredictorWithInfo models.

Parameters:
Return type:

tuple[float, float] | tuple[list[float], list[int]]

Policies#

Mimics StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() on a PettingZoo  Parallel environment.

The main differences are:

  • it accepts a list of models to be applied to different sub-groups of agents

  • it returns the rewards divided by groups.

For example, if a single group is provided

>>> evaluate_policies(models=[(Indices.all(), model)], env=env)
([100.0], [100.0])

it returns a list of with a single value for the average and standard deviation over all episodes of the mean reward over all agents.

Instead, if we pass two groups, like

>>> models=[({1, 2}, model1), ({3, 4}, model2), ({5, 6}, model3)]
>>> evaluate_policies(models=models, env=env)
([100.0, 90.0, 110.0], [10.0, 12.0, 7.0])

it returns two lists of three elements each, one for each group.

If return_episode_rewards is set, it returns three lists:

  • the cumulated rewards for each group and episodes (not averaged over the agents!), [[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]

  • the length of the episodes

  • the number of agents for each group and episodes. [[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]

For example, with two groups and three episodes, it will be like

>>> models=[({1, 2}, model1), ({3, 4, 5}, model2)]
>>> evaluate_policies(models=models, env=env,
                      n_eval_episodes=2,
                      return_episode_rewards=False)
([[200.0, 202.0], [300.0, 305.0]], [10, 10], [[2, 2], [3, 3]])
Parameters:
  • models (Sequence[tuple[Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL'], AnyPolicyPredictor]]) – The models as tuples of indices selecting the agents in the group and the model they will apply

  • env (BaseParallelEnv) – The environment

  • n_eval_episodes (int) – The number of episodes

  • deterministic (bool) – Whether the policy is applied deterministically

  • return_episode_rewards (bool) – Whether to return all episodes (vs averaging them)

  • warn (bool) – Whether to enable warnings

Returns:

If return_episode_rewards is set, a tuple (list of lists of cumulated episodes rewards’, list of episodes length, list of lists of size of groups), else, a tuple (list of average episodes rewards, list of std dev of episodes rewards)

Return type:

tuple[list[float], list[float]] | tuple[list[list[float]], list[int], list[list[int]]]

Scenarios#

Bases: object

A navground scenario initializer to configure groups of agents.

It is designed to be added to a scenario, like

>>> from navground import sim
>>> from navground.learning import GroupConfig
>>>
>>> scenario = sim.load_scenario(...)
>>> groups = [GroupConfig(policy='policy.onnx', color='red', indices='ALL')]
>>> scenario.add_init(InitPolicyBehavior(groups=groups))
>>> world = scenario.make_world(seed=101)
>>> another_world = scenario.make_world(seed=313)
Parameters:
  • groups (Collection[GroupConfig]) – The configuration of groups of agents

  • bounds (Bounds | None) – Optional termination boundaries

  • terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries

  • grouped (bool) – Whether the policy is grouped.

  • deterministic (bool) – Whether to apply the policies deterministically

  • pre (ObservationTransform | None) – An optional transformation to apply to observations

Returns a scenario initializer using the configuration stored in an environment.

groups are merged using navground.learning.config.merge_groups_configs().

Parameters:
  • env (BaseEnv | BaseParallelEnv | VecEnv) – The environment

  • groups (Collection[GroupConfig]) – The configuration of groups of agents

  • bounds (Bounds | None) – Optional termination boundaries

  • terminate_outside_bounds (bool | None) – Whether to terminate if some of the agents exits the boundaries

  • grouped (bool) – Whether the policy is grouped.

  • deterministic (bool) – Whether to apply the policies deterministically

  • pre (ObservationTransform | None) – An optional transformation to apply to observations

Returns:

The scenario initializer.

Return type:

InitPolicyBehavior

Experiments#

Initializes an navground experiment where groups of agents are configured with possibly different policies and the rewards is optionally recorded.

If groups is not empty, it make a copy of the scenario, to which it adds adds navground.learning.evaluation.scenario.InitPolicyBehavior to initialize the groups.

If record_reward is set, it adds a navground.learning.probes.reward.RewardProbe.

Parameters:
  • scenario (sim.Scenario) – The scenario

  • groups (Collection[GroupConfig]) – The configuration of the groups

  • reward (Reward | None) – The default reward to record (when not specified in the group config)

  • record_reward (bool) – Whether to record the rewards

  • policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)

  • bounds (Bounds | None) – Optional termination boundaries

  • terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries

  • deterministic (bool) – Whether to apply the policies deterministically

Returns:

The experiment

Return type:

sim.Experiment

Similar to make_experiment() but using the configuration stored in an environment: groups are merged using navground.learning.config.merge_groups_configs().

Parameters:
  • env (BaseEnv | BaseParallelEnv | VecEnv) – The environment

  • groups (Collection[GroupConfig]) – The configuration of the groups

  • reward (Reward | None) – The default reward to record (when not specified in the group config)

  • record_reward (bool) – Whether to record the rewards

  • record_success (bool) – Whether to record the success

  • policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)

  • grouped (bool) – Whether the policy is grouped.

  • deterministic (bool) – Whether to apply the policies deterministically

  • pre (ObservationTransform | None) – An optional transformation to apply to observations

Returns:

The experiment

Return type:

sim.Experiment

Logging#

Bases: object

The configuration to log trajectory plots

Parameters:

Bases: object

The configuration to log videos

Parameters:

Configure the model logger to log additional data:

  • trajectory plots

  • trajectory videos

  • statistics on reward, collisions, efficacy and safety violations

  • hparams

  • data as YAML

  • model policy graph

  • a YAML representation of the environment (at the begin of the logging)

Parameters:
  • model (OffPolicyAlgorithm | BaseILAlgorithm) – The model being trained

  • env (BaseEnv | VecEnv | BaseParallelEnv | None) – The testing environment

  • video_config (VideoConfig) – The video configuration

  • plot_config (TrajectoryPlotConfig) – The plot configuration

  • episodes (int) – The number of episodes over which to compute statistics

  • hparams (dict[str, Any]) – The hparams

  • data (dict[str, Any]) – The data

  • log_graph (bool) – The log graph

  • collisions (bool) – Whether to record episodes’ collisions

  • efficacy (bool) – Whether to record episodes’ efficacy

  • safety_violation (bool) – Whether to record episodes’ safety violation

  • duration (bool) – Whether to record episodes’ duration

  • processes (int) – Number of processes to use

  • use_multiprocess (bool) – Whether to use multiprocess instead of multiprocessing

  • use_onnx (bool) – Whether to use onnx for inference

  • grouped (bool) – Whether the policy is grouped.

  • every (int)

  • reward (bool)

Return type:

None

Video#

Display a video from one episode from an evaluation experiment.

Parameters:
  • experiment (Experiment) – The experiment

  • factor (int) – The real-time factor

  • seed (int) – The seed (only applies if of == 0)

  • use_world_bounds (bool) – Whether to keep the initial world bounds

  • select (int) – Which episode to select (ordered by reward, only applies if of > 0)

  • of (int) – The number of runs to choose the episode from

  • kwargs (Any) – Keywords arguments passed to the renderer

Return type:

Any

Record a video from one episode from an evaluation experiment.

Parameters:
  • experiment (Experiment) – The experiment

  • path (PathLike) – Where to save the video

  • factor (int) – The real-time factor

  • seed (int) – The seed (only applies if of == 0)

  • use_world_bounds (bool) – Whether to keep the initial world bounds

  • select (int) – Which episode to select (ordered by reward, only applies if of > 0)

  • of (int) – The number of runs to choose the episode from

  • kwargs (Any) – Keywords arguments passed to the renderer

Return type:

None

Display the video from one episode of an environment.

Parameters:
  • env (BaseEnv | BaseParallelEnv | VecEnv) – The environment

  • groups (Collection[GroupConfig]) – The configuration of the groups

  • policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)

  • factor (int) – The real-time factor

  • seed (int) – The seed (only applies if of == 0)

  • grouped (bool) – Whether the policy is grouped

  • use_world_bounds (bool) – Whether to keep the initial world bounds

  • select (int) – Which episode to select (ordered by reward, only applies if of > 0)

  • of (int) – The number of runs to choose the episode from

  • kwargs – Keywords arguments passed to the renderer

Return type:

Any

Record the video from one episode of an environment.

Parameters:
  • env (BaseEnv | BaseParallelEnv | VecEnv) – The environment

  • path (PathLike) – Where to save the video

  • groups (Collection[GroupConfig]) – The configuration of the groups

  • policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)

  • factor (int) – The real-time factor

  • seed (int) – The seed (only applies if of == 0)

  • grouped (bool) – Whether the policy is grouped

  • use_world_bounds (bool) – Whether to keep the initial world bounds

  • select (int) – Which episode to select (ordered by reward, only applies if of > 0)

  • of (int) – The number of runs to choose the episode from

  • kwargs – Keywords arguments passed to the renderer