Evaluation#
navground.learning.evaluation
Policy#
Extends StableBaseline3
stable_baselines3.common.evaluation.evaluate_policy()
to also acceptnavground.learning.types.PolicyPredictorWithInfo
models.- Parameters:
- Return type:
Policies#
Mimics StableBaseline3
stable_baselines3.common.evaluation.evaluate_policy()
on aPettingZoo Parallel environment
.The main differences are:
it accepts a list of models to be applied to different sub-groups of agents
it returns the rewards divided by groups.
For example, if a single group is provided
>>> evaluate_policies(models=[(Indices.all(), model)], env=env) ([100.0], [100.0])
it returns a list of with a single value for the average and standard deviation over all episodes of the mean reward over all agents.
Instead, if we pass two groups, like
>>> models=[({1, 2}, model1), ({3, 4}, model2), ({5, 6}, model3)] >>> evaluate_policies(models=models, env=env) ([100.0, 90.0, 110.0], [10.0, 12.0, 7.0])
it returns two lists of three elements each, one for each group.
If return_episode_rewards is set, it returns three lists:
the cumulated rewards for each group and episodes (not averaged over the agents!),
[[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]
the length of the episodes
the number of agents for each group and episodes.
[[grp_1_ep_1, grp_1_ep_2, ...], [grp_2_ep_1, grp_2_ep_2, ...]
For example, with two groups and three episodes, it will be like
>>> models=[({1, 2}, model1), ({3, 4, 5}, model2)] >>> evaluate_policies(models=models, env=env, n_eval_episodes=2, return_episode_rewards=False) ([[200.0, 202.0], [300.0, 305.0]], [10, 10], [[2, 2], [3, 3]])
- Parameters:
models (Sequence[tuple[Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL'], AnyPolicyPredictor]]) – The models as tuples of indices selecting the agents in the group and the model they will apply
env (BaseParallelEnv) – The environment
n_eval_episodes (int) – The number of episodes
deterministic (bool) – Whether the policy is applied deterministically
return_episode_rewards (bool) – Whether to return all episodes (vs averaging them)
warn (bool) – Whether to enable warnings
- Returns:
If
return_episode_rewards
is set, a tuple (list of lists of cumulated episodes rewards’, list of episodes length, list of lists of size of groups), else, a tuple (list of average episodes rewards, list of std dev of episodes rewards)- Return type:
tuple[list[float], list[float]] | tuple[list[list[float]], list[int], list[list[int]]]
Scenarios#
A navground scenario initializer to configure groups of agents.
It is designed to be added to a scenario, like
>>> from navground import sim >>> from navground.learning import GroupConfig >>> >>> scenario = sim.load_scenario(...) >>> groups = [GroupConfig(policy='policy.onnx', color='red', indices='ALL')] >>> scenario.add_init(InitPolicyBehavior(groups=groups)) >>> world = scenario.make_world(seed=101) >>> another_world = scenario.make_world(seed=313)
- Parameters:
groups (Collection[GroupConfig]) – The configuration of groups of agents
bounds (Bounds | None) – Optional termination boundaries
terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically
Returns a scenario initializer using the configuration stored in an environment.
groups are merged using
navground.learning.config.merge_groups_configs()
.- Parameters:
env (BaseEnv | BaseParallelEnv | VecEnv) – The environment
groups (Collection[GroupConfig]) – The configuration of groups of agents
bounds (Bounds | None) – Optional termination boundaries
terminate_outside_bounds (bool | None) – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically
- Returns:
The scenario initializer.
- Return type:
Experiments#
Initializes an navground experiment where groups of agents are configured with possibly different policies and the rewards is optionally recorded.
If
groups
is not empty, it make a copy of the scenario, to which it adds addsnavground.learning.evaluation.scenario.InitPolicyBehavior
to initialize the groups.If
record_reward
is set, it adds anavground.learning.probes.reward.RewardProbe
.- Parameters:
scenario (sim.Scenario) – The scenario
groups (Collection[GroupConfig]) – The configuration of the groups
reward (Reward | None) – The default reward to record (when not specified in the group config)
record_reward (bool) – Whether to record the rewards
policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)
bounds (Bounds | None) – Optional termination boundaries
terminate_outside_bounds (bool) – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically
- Returns:
The experiment
- Return type:
Similar to
make_experiment()
but using the configuration stored in an environment:groups
are merged usingnavground.learning.config.merge_groups_configs()
.- Parameters:
env (BaseEnv | BaseParallelEnv | VecEnv) – The environment
groups (Collection[GroupConfig]) – The configuration of the groups
reward (Reward | None) – The default reward to record (when not specified in the group config)
record_reward (bool) – Whether to record the rewards
policy (AnyPolicyPredictor | PathLike) – The default policy (when not specified in the group config)
bounds – Optional termination boundaries
terminate_outside_bounds – Whether to terminate if some of the agents exits the boundaries
deterministic (bool) – Whether to apply the policies deterministically
- Returns:
The experiment
- Return type:
Logging#
The configuration to log trajectory plots
Configure the model logger to log additional data:
trajectory plots
trajectory videos
statistics on reward, collisions, efficacy and safety violations
hparams
data as YAML
model policy graph
a YAML representation of the environment (at the begin of the logging)
- Parameters:
model (OffPolicyAlgorithm | BaseILAlgorithm) – The model being trained
video_config (VideoConfig) – The video configuration
plot_config (TrajectoryPlotConfig) – The plot configuration
episodes (int) – The number of episodes over which to compute statistics
log_graph (bool) – The log graph
collisions (bool) – The collisions
efficacy (bool) – The efficacy
safety_violation (bool) – The safety violation
reward (bool)
- Return type:
None