Imitation Learning#
navground.learning.il
expose a similar API has
stable_baselines3.common.base_class.BaseAlgorithm
, in particular for saving and loadingaccept as experts callables that consumes an
info
dictionary, like in particularnavground.learning.policies.info_predictor.InfoPolicy
, which is the way we expose actions computed in navground thought the environments, and therefore the way to learn to imitate navigation behaviors running in navground:This extend expert to any policy in
where
is the same as
imitation.data.rollout.PolicyCallable
This requires a modified version of rollout.py, which we implement in function
Patches the orginal version in
imitation
to add support forPolicyCallableWithInfo
- Parameters:
policy (AnyPolicy) – The policy
venv (VecEnv) – The venv
sample_until (rollout_without_info.GenTrajTerminationFn) – The sample until criteria
rng (np.random.Generator) – The random number generator
deterministic_policy (bool) – Whether the policy is deterministic
- Returns:
the trajectories
- Return type:
Sequence[types.TrajectoryWithRew]
Note
The original functionality of imitation
is maintained. Experts that do not accept an info
dictionary are still accepted.
Utilities#
Creates an imitation-compatible vectorized enviroment. Just a tiny wrapped on
imitation.util.util.make_vec_env()
that readsenv_id
andenv_make_kwargs
from the env.- Parameters:
env (gym.Env[Any, Any]) – The environment
parallel (bool) – Whether to run the venv in parallel
number – The number of environments
rng (np.random.Generator) – The random number generator
num_envs (int)
- Returns:
The vectorized environment.
- Return type:
Creates an imitation-compatible vectorized enviroment from a
PettingZoo Parallel environment
, similarly asnavground.learning.parallel_env.make_vec_from_penv()
but adding a (custom)RolloutInfoWrapper
to collect rollouts.It first creates a
supersuit.vector.MarkovVectorEnv
, and then appliessupersuit.concat_vec_envs_v1()
to concatenatenumber
copies of it.- Parameters:
env (ParallelEnv[int, Observation, Action]) – The environment
num_envs (int) – The number of pettingzoo environments to stuck together
processes (int) – The number of (parallel) processes
- Returns:
The vectorized environment.
- Return type:
Disables the progress while saving the DAgger datasets and configures tqdm to use
auto
.- Returns:
{ description_of_the_return_value }
- Return type:
None
Base class#
The base class that wraps IL algorithms to implement a
stable_baselines3.common.base_class.BaseAlgorithm
-like API.- Parameters:
seed (int) – the random seed
policy (Any) – an optional policy
policy_kwargs (Mapping[str, Any]) – or the kwargs to create it
logger (HierarchicalLogger | None) – an optional logger
Gets the training environment.
- Returns:
The environment.
- Return type:
VecEnv | None
Learns the policy, passing the arguments to the wrapped
imitation
trainer. liketrainer.train(*args, **kwargs)
Loads a model using
stable_baselines3.common.save_util.load_from_zip_file()
Saves the model using
stable_baselines3.common.save_util.save_to_zip_file()
Sets the training environment.
Rejects the enviroment if not compatible with the policy.
Sets the logger.
- Parameters:
logger (HierarchicalLogger) – The logger
- Return type:
None
The action space
The training env
The logger
The observation space
Gets the policy
Behavior Cloning#
A simplified interface to
imitation.algorithms.bc.BC
- Parameters:
seed (int) – the random seed
policy (Any) – an optional policy
policy_kwargs (Mapping[str, Any]) – or the kwargs to create it
logger (HierarchicalLogger | None) – an optional logger
runs (int) – how many runs to collect at init
expert (rollout.AnyPolicy | None) – the expert to imitate. If not set, it will default to the policy retrieved from the env attribute “policy” (if set) else to None.
bc_kwargs (Mapping[str, Any]) – parameters passed to the
imitation.algorithms.bc.BC
constructor
Collect training runs from an expert.
- Parameters:
runs (int) – The number of runs
expert (rollout.AnyPolicy | None) – the expert whose trajectories we want to collect. If not set, it will default to the expert configured at init.
- Return type:
None
DAgger#
A simplified interface to out version of
imitation.algorithms.dagger.SimpleDAggerTrainer
that acceptsPolicyCallableWithInfo
experts.- Parameters:
seed (int) – the random seed
policy (Any) – an optional policy
policy_kwargs (Mapping[str, Any]) – or the kwargs to create it
logger (HierarchicalLogger | None) – an optional logger
expert (AnyPolicy | None) – the expert to imitate
bc_kwargs (Mapping[str, Any]) – parameters passed to the :imitation.algorithms.bc.BC: constructor
dagger_kwargs (dict[str, Any]) – parameters passed to the
imitation.algorithms.dagger.SimpleDAggerTrainer
constructor