Utils#
navground.learning.utils
Plotting#
navground.learning.utils.plot
Bases:
NamedTuple
StableBaseLine3#
navground.learning.utils.sb3
Bases:
BaseCallback
Similar to SB3’s own
stable_baselines3.common.callbacks.ProgressBarCallback
, it displays a progress bar when training SB3 agent using tqdm but includes episodes mean reward and length.- Parameters:
every (int)
Bases:
BaseCallback
A SB3 callback that makes a video of one or more runs of an environment.
Loads the evaluation logs from a csv.
Plots reward, success and/or length from the eval logs
- Parameters:
path (PathLike) – The directory with csv logs
reward (bool) – Whether to plot the reward
reward_color (str) – The reward color
reward_linestyle (str) – The reward linestyle
success (bool) – Whether to plot the success
length (bool) – Whether to plot the length
reward_low (float | None) – An optional lower bound to scale the reward
reward_high (float) – An optional upper bound to scale the reward
lenght_low – An optional lower bound to scale the reward
lenght_high – An optional upper bound to scale the reward
two_axis (bool) – Whether to use two axis (only if there are two fields)
length_low (float)
length_high (float | None)
kwargs (Any)
- Return type:
None
BenchMARL#
navground.learning.utils.benchmarl
Bases:
Module
This class conforms to
navground.learning.types.PyTorchPolicy
and is construction from a TorchRL policy.- Parameters:
observation_space (gym.Space)
action_space (gym.spaces.Box)
policy (Any)
Bases:
Callback
Export the policy. The best model gets exported as
"best_policy.onnx"
. The others as"policy_<iterations>.onnx"
.- Parameters:
export_all (bool) – Whether to export all (vs just the best) policies.
Bases:
Callback
- Parameters:
env (MultiAgentNavgroundEnv)
seed (int)
categorical_actions (bool)
- Return type:
Similar interface as StableBaseline3
stable_baselines3.common.evaluation.evaluate_policy()
but for TorchRL environments (and policies).- Parameters:
policy (Any) – The policy
env (Any) – The environment
n_eval_episodes (int) – The number of episodes
deterministic (bool) – Whether to evaluate the policy deterministic
return_episode_rewards (bool) – Whether to return individual episode rewards (vs aggregate them)
warn (bool) – Whether to emit warnings
- Returns:
If
return_episode_rewards
is set, a tuple (list of cumulated episodes rewards’, list of episodes length) else, a tuple (average episodes rewards, std dev of episodes rewards)- Return type: