Imitation Learning#

navground.learning.il

In this sub-package, we wraps and patch parts of imitation for algorithms to

This requires a modified version of rollout.py, which we implement in function

Patches the orginal version in imitation to add support for PolicyCallableWithInfo

Parameters:
Returns:

the trajectories

Return type:

Sequence[types.TrajectoryWithRew]

Note

The original functionality of imitation is maintained. Experts that do not accept an info dictionary are still accepted.

Utilities#

Creates an imitation-compatible vectorized enviroment. Just a tiny wrapped on imitation.util.util.make_vec_env() that reads env_id and env_make_kwargs from the env.

Parameters:
  • env (gym.Env[Any, Any]) – The environment

  • parallel (bool) – Whether to run the venv in parallel

  • number – The number of environments

  • rng (np.random.Generator) – The random number generator

  • num_envs (int)

Returns:

The vectorized environment.

Return type:

VecEnv

Creates an imitation-compatible vectorized enviroment from a PettingZoo  Parallel environment, similarly as navground.learning.parallel_env.make_vec_from_penv() but adding a (custom) RolloutInfoWrapper to collect rollouts.

It first creates a supersuit.vector.MarkovVectorEnv, and then applies supersuit.concat_vec_envs_v1() to concatenate number copies of it.

Parameters:
  • env (ParallelEnv[int, Observation, Action]) – The environment

  • num_envs (int) – The number of pettingzoo environments to stuck together

  • processes (int) – The number of (parallel) processes

Returns:

The vectorized environment.

Return type:

VecEnv

Disables the progress while saving the DAgger datasets and configures tqdm to use auto.

Returns:

{ description_of_the_return_value }

Return type:

None

Base class#

The base class that wraps IL algorithms to implement a stable_baselines3.common.base_class.BaseAlgorithm-like API.

Parameters:
  • env (gym.Env[Any, Any] | VecEnv | None) – the environment.

  • seed (int) – the random seed

  • policy (Any) – an optional policy

  • policy_kwargs (Mapping[str, Any]) – or the kwargs to create it

  • logger (HierarchicalLogger | None) – an optional logger

Gets the training environment.

Returns:

The environment.

Return type:

VecEnv | None

Learns the policy, passing the arguments to the wrapped imitation trainer. like

trainer.train(*args, **kwargs)
Returns:

self

Parameters:
Return type:

Self

Loads a model using stable_baselines3.common.save_util.load_from_zip_file()

Parameters:
  • path (pl.Path | str) – The path to the saved model

  • env (gym.Env[Any, Any] | VecEnv | None) – An optional training environment

Returns:

the loaded algorithm

Return type:

Self

Saves the model using stable_baselines3.common.save_util.save_to_zip_file()

Parameters:

path (Path | str) – The path to the directory where to create

Return type:

None

Sets the training environment.

Rejects the enviroment if not compatible with the policy.

Parameters:

env (gym.Env[Any, Any] | VecEnv) – The new environment

Return type:

None

Sets the logger.

Parameters:

logger (HierarchicalLogger) – The logger

Return type:

None

The action space

The training env

The logger

The observation space

Gets the policy

Behavior Cloning#

A simplified interface to imitation.algorithms.bc.BC

Parameters:
  • env (gym.Env[Any, Any] | VecEnv | None) – the environment.

  • seed (int) – the random seed

  • policy (Any) – an optional policy

  • policy_kwargs (Mapping[str, Any]) – or the kwargs to create it

  • logger (HierarchicalLogger | None) – an optional logger

  • runs (int) – how many runs to collect at init

  • expert (rollout.AnyPolicy | None) – the expert to imitate. If not set, it will default to the policy retrieved from the env attribute “policy” (if set) else to None.

  • bc_kwargs (Mapping[str, Any]) – parameters passed to the imitation.algorithms.bc.BC constructor

Collect training runs from an expert.

Parameters:
  • runs (int) – The number of runs

  • expert (rollout.AnyPolicy | None) – the expert whose trajectories we want to collect. If not set, it will default to the expert configured at init.

Return type:

None

DAgger#

A simplified interface to out version of imitation.algorithms.dagger.SimpleDAggerTrainer that accepts PolicyCallableWithInfo experts.

Parameters:
  • env (gym.Env[Any, Any] | VecEnv | None) – the environment.

  • seed (int) – the random seed

  • policy (Any) – an optional policy

  • policy_kwargs (Mapping[str, Any]) – or the kwargs to create it

  • logger (HierarchicalLogger | None) – an optional logger

  • expert (AnyPolicy | None) – the expert to imitate

  • bc_kwargs (Mapping[str, Any]) – parameters passed to the :imitation.algorithms.bc.BC: constructor

  • dagger_kwargs (dict[str, Any]) – parameters passed to the imitation.algorithms.dagger.SimpleDAggerTrainer constructor