Imitation Learning

Imitation Learning#

navground.learning.il

In this sub-package, we wraps and patch parts of imitation for algorithms to

expose a similar API has stable_baselines3.common.base_class.BaseAlgorithm, in particular for saving and loading
accept as experts callables that consumes an info dictionary, like in particular navground.learning.policies.info_predictor.InfoPolicy, which is the way we expose actions computed in navground thought the environments, and therefore the way to learn to imitate navigation behaviors running in navground:

type PolicyCallableWithInfo = Callable[[Observation, State | None, EpisodeStart, Info | None], tuple[Action, State | None]]#

This extend expert to any policy in

type AnyPolicy = stable_baselines3.common.base_class.BaseAlgorithm | stable_baselines3.common.policies.BasePolicy | PolicyCallable | PolicyCallableWithInfo#

where

type PolicyCallable = Callable[[Observation, State | None, EpisodeStart], tuple[Action, State | None]]#

is the same as imitation.data.rollout.PolicyCallable

This requires a modified version of rollout.py, which we implement in function

generate_trajectories(policy: AnyPolicy, venv: VecEnv, sample_until: rollout_without_info.GenTrajTerminationFn, rng: np.random.Generator, *, deterministic_policy: bool = False) → Sequence[types.TrajectoryWithRew]#

Patches the orginal version in imitation to add support for PolicyCallableWithInfo

Parameters:

policy (AnyPolicy) – The policy
venv (VecEnv) – The venv
sample_until (rollout_without_info.GenTrajTerminationFn) – The sample until criteria
rng (np.random.Generator) – The random number generator
deterministic_policy (bool) – Whether the policy is deterministic

Returns:

the trajectories

Return type:

Sequence[types.TrajectoryWithRew]

Note

The original functionality of imitation is maintained. Experts that do not accept an info dictionary are still accepted.

Utilities#

make_vec_from_env(env: gym.Env[Any, Any], parallel: bool = False, num_envs: int = 1, rng: np.random.Generator = Generator(PCG64) at 0x116B05B60) → VecEnv#

Creates an imitation-compatible vectorized enviroment. Just a tiny wrapped on imitation.util.util.make_vec_env() that reads env_id and env_make_kwargs from the env.

Parameters:

env (gym.Env[Any, Any]) – The environment
parallel (bool) – Whether to run the venv in parallel
number – The number of environments
rng (np.random.Generator) – The random number generator
num_envs (int)

Returns:

The vectorized environment.

Return type:

VecEnv

make_vec_from_penv(env: ParallelEnv[int, Observation, Action], num_envs: int = 1, processes: int = 1) → VecEnv#

Creates an imitation-compatible vectorized enviroment from a PettingZoo Parallel environment, similarly as navground.learning.parallel_env.make_vec_from_penv() but adding a (custom) RolloutInfoWrapper to collect rollouts.

It first creates a supersuit.vector.MarkovVectorEnv, and then applies supersuit.concat_vec_envs_v1() to concatenate number copies of it.

Parameters:

env (ParallelEnv[int, Observation, Action]) – The environment
num_envs (int) – The number of pettingzoo environments to stuck together
processes (int) – The number of (parallel) processes

Returns:

The vectorized environment.

Return type:

VecEnv

setup_tqdm() → None#

Disables the progress while saving the DAgger datasets and configures tqdm to use auto.

Returns:: { description_of_the_return_value }
Return type:: None

Base class#

class BaseILAlgorithm(env: gym.Env[Any, Any] | VecEnv | None = None, seed: int = 0, policy: Any = None, policy_kwargs: Mapping[str, Any] = {'net_arch': [32, 32]}, logger: HierarchicalLogger | None = None)#

The base class that wraps IL algorithms to implement a stable_baselines3.common.base_class.BaseAlgorithm-like API.

Parameters:

env (gym.Env[Any, Any] | VecEnv | None) – the environment.
seed (int) – the random seed
policy (Any) – an optional policy
policy_kwargs (Mapping[str, Any]) – or the kwargs to create it
logger (HierarchicalLogger | None) – an optional logger

get_env() → VecEnv | None#

Gets the training environment.

Returns:: The environment.
Return type:: VecEnv | None

learn(*args: Any, **kwargs: Any) → Self#

Learns the policy, passing the arguments to the wrapped imitation trainer. like

trainer.train(*args, **kwargs)

Returns:

self

Parameters:

args (Any)
kwargs (Any)

Return type:

Self

classmethod load(path: pl.Path | str, env: gym.Env[Any, Any] | VecEnv | None = None) → Self#

Loads a model using stable_baselines3.common.save_util.load_from_zip_file()

Parameters:

path (pl.Path | str) – The path to the saved model
env (gym.Env[Any, Any] | VecEnv | None) – An optional training environment

Returns:

the loaded algorithm

Return type:

Self

save(path: Path | str) → None#

Saves the model using stable_baselines3.common.save_util.save_to_zip_file()

Parameters:: path (Path | str) – The path to the directory where to create
Return type:: None

set_env(env: gym.Env[Any, Any] | VecEnv) → None#

Sets the training environment.

Rejects the enviroment if not compatible with the policy.

Parameters:: env (gym.Env[Any, Any] | VecEnv) – The new environment
Return type:: None

set_logger(logger: HierarchicalLogger) → None#

Sets the logger.

Parameters:: logger (HierarchicalLogger) – The logger
Return type:: None

property action_space: Space[Any]#: The action space

property env: VecEnv | None#: The training env

property logger: HierarchicalLogger#: The logger

property observation_space: Space[Any]#: The observation space

property policy: Any#: Gets the policy

Behavior Cloning#

class BC(env: gym.Env[Any, Any] | VecEnv | None = None, seed: int = 0, policy: Any = None, policy_kwargs: Mapping[str, Any] = {'net_arch': [32, 32]}, logger: HierarchicalLogger | None = None, runs: int = 0, expert: rollout.AnyPolicy | None = None, bc_kwargs: Mapping[str, Any] = {})#

A simplified interface to imitation.algorithms.bc.BC

Parameters:

env (gym.Env[Any, Any] | VecEnv | None) – the environment.
seed (int) – the random seed
policy (Any) – an optional policy
policy_kwargs (Mapping[str, Any]) – or the kwargs to create it
logger (HierarchicalLogger | None) – an optional logger
runs (int) – how many runs to collect at init
expert (rollout.AnyPolicy | None) – the expert to imitate. If not set, it will default to the policy retrieved from the env attribute “policy” (if set) else to None.
bc_kwargs (Mapping[str, Any]) – parameters passed to the imitation.algorithms.bc.BC constructor

collect_runs(runs: int, expert: rollout.AnyPolicy | None = None) → None#

Collect training runs from an expert.

Parameters:

runs (int) – The number of runs
expert (rollout.AnyPolicy | None) – the expert whose trajectories we want to collect. If not set, it will default to the expert configured at init.

Return type:

None

DAgger#

class DAgger(env: gym.Env[Any, Any] | VecEnv | None = None, seed: int = 0, policy: Any = None, policy_kwargs: Mapping[str, Any] = {'net_arch': [32, 32]}, logger: HierarchicalLogger | None = None, expert: AnyPolicy | None = None, bc_kwargs: Mapping[str, Any] = {}, dagger_kwargs: dict[str, Any] = {})#

A simplified interface to out version of imitation.algorithms.dagger.SimpleDAggerTrainer that accepts PolicyCallableWithInfo experts.

Parameters:

env (gym.Env[Any, Any] | VecEnv | None) – the environment.
seed (int) – the random seed
policy (Any) – an optional policy
policy_kwargs (Mapping[str, Any]) – or the kwargs to create it
logger (HierarchicalLogger | None) – an optional logger
expert (AnyPolicy | None) – the expert to imitate
bc_kwargs (Mapping[str, Any]) – parameters passed to the :imitation.algorithms.bc.BC: constructor
dagger_kwargs (dict[str, Any]) – parameters passed to the imitation.algorithms.dagger.SimpleDAggerTrainer constructor

Imitation Learning

Contents

Imitation Learning#

Utilities#

Base class#

Behavior Cloning#

DAgger#