Evaluation#

navground.learning.evaluation

Policy#

evaluate_policy(model: AnyPolicyPredictor, env: BaseEnv | VecEnv, n_eval_episodes: int = 10, deterministic: bool = True, render: bool = False, callback: Callable[[dict[str, Any], dict[str, Any]], None] | None = None, reward_threshold: float | None = None, return_episode_rewards: bool = False, warn: bool = True) → tuple[float, float] | tuple[list[float], list[int]]#

Extends StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() to also accept navground.learning.types.PolicyPredictorWithInfo models.

Parameters:

model (AnyPolicyPredictor)
env (BaseEnv | VecEnv)
n_eval_episodes (int)
deterministic (bool)
render (bool)
callback (Callable[[dict[str, Any], dict[str, Any]], None] | None)
reward_threshold (float | None)
return_episode_rewards (bool)
warn (bool)

Return type:

tuple[float, float] | tuple[list[float], list[int]]

Policies#

evaluate_policies(models: Sequence[tuple[Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL'], AnyPolicyPredictor]], env: BaseParallelEnv, n_eval_episodes: int = 10, deterministic: bool = True, return_episode_rewards: bool = False, warn: bool = True) → tuple[list[float], list[float]] | tuple[list[list[float]], list[int], list[list[int]]]#

Mimics StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() on a PettingZoo Parallel environment.

The main differences are:

it accepts a list of models to be applied to different sub-groups of agents
it returns the rewards divided by groups.

For example, if a single group is provided

>>> evaluate_policies(models=[(Indices.all(), model)], env=env)
([100.0], [100.0])

it returns a list of with a single value for the average and standard deviation over all episodes of the mean reward over all agents.

Instead, if we pass two groups, like

>>> models=[({1, 2}, model1), ({3, 4}, model2), ({5, 6}, model3)]
>>> evaluate_policies(models=models, env=env)
([100.0, 90.0, 110.0], [10.0, 12.0, 7.0])

it returns two lists of three elements each, one for each group.

If return_episode_rewards is set, it returns three lists:

the cumulated rewards for each group and episodes (not averaged over the agents!), [[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]
the length of the episodes
the number of agents for each group and episodes. [[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]

For example, with two groups and three episodes, it will be like

>>> models=[({1, 2}, model1), ({3, 4, 5}, model2)]
>>> evaluate_policies(models=models, env=env,
                      n_eval_episodes=2,
                      return_episode_rewards=False)
([[200.0, 202.0], [300.0, 305.0]], [10, 10], [[2, 2], [3, 3]])

Parameters:

models (Sequence[tuple[Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL'], AnyPolicyPredictor]]) – The models as tuples of indices selecting the agents in the group and the model they will apply
env (BaseParallelEnv) – The environment
n_eval_episodes (int) – The number of episodes
deterministic (bool) – Whether the policy is applied deterministically
return_episode_rewards (bool) – Whether to return all episodes (vs averaging them)
warn (bool) – Whether to enable warnings

Returns:

If return_episode_rewards is set, a tuple (list of lists of cumulated episodes rewards’, list of episodes length, list of lists of size of groups), else, a tuple (list of average episodes rewards, list of std dev of episodes rewards)

Return type:

tuple[list[float], list[float]] | tuple[list[list[float]], list[int], list[list[int]]]

Navground policy#

evaluate(env: BaseEnv | BaseParallelEnv | VecEnv, n_eval_episodes: int = 10, deterministic: bool = True, return_episode_rewards: bool = False, warn: bool = True, indices: IndicesLike = Indices.all()) → tuple[float, float] | tuple[list[float], list[int]]#

Similar interface as StableBaseline3 stable_baselines3.common.evaluation.evaluate_policy() to evaluate the navground policy/behavior.

Internally, it runs navground.learning.evaluation.evaluate_policy() or navground.learning.evaluation.evaluate_policies(), depending if env is a single or a multi-agent environment.

Parameters:

env (BaseEnv | BaseParallelEnv | VecEnv) – The environment
n_eval_episodes (int) – The number of episodes
deterministic (bool) – Whether the policy is applied deterministically
return_episode_rewards (bool) – Whether to return all episodes (vs averaging them)
warn (bool) – Whether to enable warnings
indices (IndicesLike) – The indices of the agents whose reward we want to record

Returns:

Same as stable_baselines3.common.evaluation.evaluate_policy() return, a tuple of (mean, std dev) or a tuple [reward_ep_1, reward_ep_2, …], [length_ep_1, length_ep_2, …]

Return type:

tuple[float, float] | tuple[list[float], list[int]]

evaluate_with_experiment(scenario: Scenario, reward: Reward, n_eval_episodes: int = 10, return_episode_rewards: bool = False, time_step: float = 0.1, steps: int = 100, indices: Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL'] = Indices.all()) → tuple[float, float] | tuple[list[float], list[int]]#

Evaluate the navground policy/behavior using a navground experiment by passing an empty policy to navground.learning.evaluation.make_experiment().

Arguments n_eval_episodes and return_episode_rewards mimic evaluate() but the return of this function is slighty different when return_episode_rewards is not set: it returns the cumulated reward over each episodes for each agent, not averaging over the group.

Parameters:

scenario (Scenario) – The scenario
reward (Reward) – The reward to record
n_eval_episodes (int) – The number of episodes
return_episode_rewards (bool) – Whether to return all rewards (vs averaging them)
time_step (float) – The time step
steps (int) – The steps
indices (Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL']) – The indices of the agents whose reward we want to record

Returns:

A tuple of (mean, std dev) or a tuple [reward_agent_1_ep_1, reward_agent_2_ep_1, …, reward_agent_n_ep_m], [length_ep_1, length_ep_2, …]

Return type:

tuple[float, float] | tuple[list[float], list[int]]

evaluate_with_experiment_and_env(env: BaseEnv | BaseParallelEnv | VecEnv, n_eval_episodes: int = 10, return_episode_rewards: bool = False, indices: IndicesLike = Indices.all()) → tuple[list[float], list[int]] | tuple[float, float]#

Similat to evaluate_with_experiment() but with the configuration stored in the environment.

Parameters:

env (BaseEnv | BaseParallelEnv | VecEnv) – The environment
n_eval_episodes (int) – The number of episodes
return_episode_rewards (bool) – Whether to return all rewards (vs averaging them)
indices (IndicesLike) – The indices of the agents whose reward we want to record

Returns:

Same as evaluate_with_experiment()

Return type:

tuple[list[float], list[int]] | tuple[float, float]

Scenarios#

class InitPolicyBehavior(groups: Collection[GroupConfig] = (), bounds: Bounds | None = None, terminate_outside_bounds: bool = True, deterministic: bool = True)#

A navground scenario initializer to configure groups of agents.

It is designed to be added to a scenario, like

>>> from navground import sim
>>> from navground.learning import GroupConfig
>>>
>>> scenario = sim.load_scenario(...)
>>> groups = [GroupConfig(policy='policy.onnx', color='red', indices='ALL')]
>>> scenario.add_init(InitPolicyBehavior(groups=groups))
>>> world = scenario.make_world(seed=101)
>>> another_world = scenario.make_world(seed=313)

Parameters:

groups (Collection[GroupConfig]) – The configuration of groups of agents
bounds (Bounds | None) – Optional termination boundaries
terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically

classmethod with_env(env: BaseEnv | BaseParallelEnv | VecEnv, groups: Collection[GroupConfig] = (), bounds: Bounds | None = None, terminate_outside_bounds: bool | None = None, deterministic: bool = True) → InitPolicyBehavior#

Returns a scenario initializer using the configuration stored in an environment.

groups are merged using navground.learning.config.merge_groups_configs().

Parameters:

env (BaseEnv | BaseParallelEnv | VecEnv) – The environment
groups (Collection[GroupConfig]) – The configuration of groups of agents
bounds (Bounds | None) – Optional termination boundaries
terminate_outside_bounds (bool | None) – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically

Returns:

The scenario initializer.

Return type:

InitPolicyBehavior

Experiments#

make_experiment(scenario: sim.Scenario, groups: Collection[GroupConfig] = (), reward: Reward | None = None, record_reward: bool = True, policy: AnyPolicyPredictor | PathLike = '', bounds: Bounds | None = None, terminate_outside_bounds: bool = True, deterministic: bool = True) → sim.Experiment#

Initializes an navground experiment where groups of agents are configured with possibly different policies and the rewards is optionally recorded.

If groups is not empty, it make a copy of the scenario, to which it adds adds navground.learning.evaluation.scenario.InitPolicyBehavior to initialize the groups.

If record_reward is set, it adds a navground.learning.probes.reward.RewardProbe.

Parameters:

scenario (sim.Scenario) – The scenario
groups (Collection[GroupConfig]) – The configuration of the groups
reward (Reward | None) – The default reward to record (when not specified in the group config)
record_reward (bool) – Whether to record the rewards
policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)
bounds (Bounds | None) – Optional termination boundaries
terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically

Returns:

The experiment

Return type:

sim.Experiment

make_experiment_with_env(env: BaseEnv | BaseParallelEnv | VecEnv, groups: Collection[GroupConfig] = (), policy: AnyPolicyPredictor | PathLike = '', reward: Reward | None = None, record_reward: bool = True, deterministic: bool = True) → sim.Experiment#

Similar to make_experiment() but using the configuration stored in an environment: groups are merged using navground.learning.config.merge_groups_configs().

Parameters:

env (BaseEnv | BaseParallelEnv | VecEnv) – The environment
groups (Collection[GroupConfig]) – The configuration of the groups
reward (Reward | None) – The default reward to record (when not specified in the group config)
record_reward (bool) – Whether to record the rewards
policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)
bounds – Optional termination boundaries
terminate_outside_bounds – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically

Returns:

The experiment

Return type:

sim.Experiment

Logging#

class TrajectoryPlotConfig(number: int = 0, columns: int = 1, color: Callable[[Agent], str] | None = None, step: int = 10, width: float = 10.0)#

The configuration to log trajectory plots

Parameters:

number (int)
columns (int)
color (Callable[[Agent], str] | None)
step (int)
width (float)

class VideoConfig(number: int = 0, fps: int = 30, factor: float = 1.0, color: str = '')#

The configuration to log videos

Parameters:

number (int)
fps (int)
factor (float)
color (str)

config_eval_log(model: OffPolicyAlgorithm | BaseILAlgorithm, env: BaseEnv | VecEnv | None = None, video_config: VideoConfig = VideoConfig(number=0, fps=30, factor=1.0, color=''), plot_config: TrajectoryPlotConfig = TrajectoryPlotConfig(number=0, columns=1, color=None, step=10, width=10.0), episodes: int = 100, hparams: dict[str, Any] = {}, data: dict[str, Any] = {}, log_graph: bool = False, reward: bool = True, collisions: bool = True, efficacy: bool = True, safety_violation: bool = True) → None#

Configure the model logger to log additional data:

trajectory plots
trajectory videos
statistics on reward, collisions, efficacy and safety violations
hparams
data as YAML
model policy graph
a YAML representation of the environment (at the begin of the logging)

Parameters:

model (OffPolicyAlgorithm | BaseILAlgorithm) – The model being trained
env (BaseEnv | VecEnv | None) – The testing environment
video_config (VideoConfig) – The video configuration
plot_config (TrajectoryPlotConfig) – The plot configuration
episodes (int) – The number of episodes over which to compute statistics
hparams (dict[str, Any]) – The hparams
data (dict[str, Any]) – The data
log_graph (bool) – The log graph
collisions (bool) – The collisions
efficacy (bool) – The efficacy
safety_violation (bool) – The safety violation
reward (bool)

Return type:

None

Evaluation

Contents

Evaluation#

Policy#

Policies#

Navground policy#

Scenarios#

Experiments#

Logging#