Evaluation#

navground.learning.evaluation

Policy#

Extends StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() to also accept navground.learning.types.PolicyPredictorWithInfo models.

Parameters:
Return type:

tuple[float, float] | tuple[list[float], list[int]]

Policies#

Mimics StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() on a PettingZoo  Parallel environment.

The main differences are:

  • it accepts a list of models to be applied to different sub-groups of agents

  • it returns the rewards divided by groups.

For example, if a single group is provided

>>> evaluate_policies(models=[(Indices.all(), model)], env=env)
([100.0], [100.0])

it returns a list of with a single value for the average and standard deviation over all episodes of the mean reward over all agents.

Instead, if we pass two groups, like

>>> models=[({1, 2}, model1), ({3, 4}, model2), ({5, 6}, model3)]
>>> evaluate_policies(models=models, env=env)
([100.0, 90.0, 110.0], [10.0, 12.0, 7.0])

it returns two lists of three elements each, one for each group.

If return_episode_rewards is set, it returns three lists:

  • the cumulated rewards for each group and episodes (not averaged over the agents!), [[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]

  • the length of the episodes

  • the number of agents for each group and episodes. [[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]

For example, with two groups and three episodes, it will be like

>>> models=[({1, 2}, model1), ({3, 4, 5}, model2)]
>>> evaluate_policies(models=models, env=env,
                      n_eval_episodes=2,
                      return_episode_rewards=False)
([[200.0, 202.0], [300.0, 305.0]], [10, 10], [[2, 2], [3, 3]])
Parameters:
  • models (Sequence[tuple[Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL'], AnyPolicyPredictor]]) – The models as tuples of indices selecting the agents in the group and the model they will apply

  • env (BaseParallelEnv) – The environment

  • n_eval_episodes (int) – The number of episodes

  • deterministic (bool) – Whether the policy is applied deterministically

  • return_episode_rewards (bool) – Whether to return all episodes (vs averaging them)

  • warn (bool) – Whether to enable warnings

Returns:

If return_episode_rewards is set, a tuple (list of lists of cumulated episodes rewards’, list of episodes length, list of lists of size of groups), else, a tuple (list of average episodes rewards, list of std dev of episodes rewards)

Return type:

tuple[list[float], list[float]] | tuple[list[list[float]], list[int], list[list[int]]]

Scenarios#

A navground scenario initializer to configure groups of agents.

It is designed to be added to a scenario, like

>>> from navground import sim
>>> from navground.learning import GroupConfig
>>>
>>> scenario = sim.load_scenario(...)
>>> groups = [GroupConfig(policy='policy.onnx', color='red', indices='ALL')]
>>> scenario.add_init(InitPolicyBehavior(groups=groups))
>>> world = scenario.make_world(seed=101)
>>> another_world = scenario.make_world(seed=313)
Parameters:
  • groups (Collection[GroupConfig]) – The configuration of groups of agents

  • bounds (Bounds | None) – Optional termination boundaries

  • terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries

  • deterministic (bool) – Whether to apply the policies deterministically

Returns a scenario initializer using the configuration stored in an environment.

groups are merged using navground.learning.config.merge_groups_configs().

Parameters:
  • env (BaseEnv | BaseParallelEnv | VecEnv) – The environment

  • groups (Collection[GroupConfig]) – The configuration of groups of agents

  • bounds (Bounds | None) – Optional termination boundaries

  • terminate_outside_bounds (bool | None) – Whether to terminate if some of the agents exits the boundaries

  • deterministic (bool) – Whether to apply the policies deterministically

Returns:

The scenario initializer.

Return type:

InitPolicyBehavior

Experiments#

Initializes an navground experiment where groups of agents are configured with possibly different policies and the rewards is optionally recorded.

If groups is not empty, it make a copy of the scenario, to which it adds adds navground.learning.evaluation.scenario.InitPolicyBehavior to initialize the groups.

If record_reward is set, it adds a navground.learning.probes.reward.RewardProbe.

Parameters:
  • scenario (sim.Scenario) – The scenario

  • groups (Collection[GroupConfig]) – The configuration of the groups

  • reward (Reward | None) – The default reward to record (when not specified in the group config)

  • record_reward (bool) – Whether to record the rewards

  • policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)

  • bounds (Bounds | None) – Optional termination boundaries

  • terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries

  • deterministic (bool) – Whether to apply the policies deterministically

Returns:

The experiment

Return type:

sim.Experiment

Similar to make_experiment() but using the configuration stored in an environment: groups are merged using navground.learning.config.merge_groups_configs().

Parameters:
  • env (BaseEnv | BaseParallelEnv | VecEnv) – The environment

  • groups (Collection[GroupConfig]) – The configuration of the groups

  • reward (Reward | None) – The default reward to record (when not specified in the group config)

  • record_reward (bool) – Whether to record the rewards

  • policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)

  • bounds – Optional termination boundaries

  • terminate_outside_bounds – Whether to terminate if some of the agents exits the boundaries

  • deterministic (bool) – Whether to apply the policies deterministically

Returns:

The experiment

Return type:

sim.Experiment

Logging#

The configuration to log trajectory plots

Parameters:

The configuration to log videos

Parameters:

Configure the model logger to log additional data:

  • trajectory plots

  • trajectory videos

  • statistics on reward, collisions, efficacy and safety violations

  • hparams

  • data as YAML

  • model policy graph

  • a YAML representation of the environment (at the begin of the logging)

Parameters:
Return type:

None