Policies#

navground.learning.policies

Null#

class NullPolicy(*args, squash_output: bool = False, **kwargs)#

Bases: BasePolicy

This class describes a dummy policy that always returns zeros.

Parameters:: squash_output (bool)

class NullPredictor(action_space: gym.spaces.Box, observation_space: gym.Space[Any])#

Bases: object

This class describes a dummy predictor that always returns zeros and conforms to navground.learning.types.PolicyPredictor

Parameters:

action_space (gym.spaces.Box) – The action space
observation_space (gym.Space[Any]) – An optional observation space

Random#

class RandomPolicy(*args, squash_output: bool = False, **kwargs)#

Bases: BasePolicy

This class describes a onnx-able policy that returns random actions.

Parameters:: squash_output (bool)

class RandomPredictor(action_space: gym.spaces.Box, observation_space: gym.Space[Any])#

Bases: object

This class describes a predictor that returns random actions and conforms to navground.learning.types.PolicyPredictor

Parameters:

action_space (gym.spaces.Box) – The action space
observation_space (gym.Space[Any]) – An optional observation space

Info#

class InfoPolicy(action_space: Space[Any], key: str, observation_space: Space[Any] = Dict())#

Bases: object

A predictor that extracts navground actions from the info dictionary and conforms to navground.learning.types.PolicyPredictorWithInfo

Parameters:

action_space (gym.Space[Any]) – The action space
key (str) – The key of the action in the info dictionary
observation_space (gym.Space[Any]) – An optional observation space

Ordering-invariant extractor#

type Reduction = Callable[[torch.Tensor, int, bool], torch.Tensor]#

class OrderInvariantCombinedExtractor(observation_space: Dict, cnn_output_dim: int = 256, normalized_image: bool = False, order_invariant_keys: Collection[str] = [], replicated_keys: Collection[str] = [], filter_key: str = '', removed_keys: Collection[str] = [], net_arch: list[int] = [8], activation_fn: type[Module] | None = None, reductions: Sequence[Reduction] | None = None)#

Bases: BaseFeaturesExtractor

A variation of SB3 stable_baselines3.common.torch_layers.CombinedExtractor that applies a ordering invariant MLP feature extractor to a group of keys after optionally masking it.

Same as the original CombinedExtractor: Combined features extractor for Dict observation spaces. Builds a features extractor for each key of the space. Input from each space is fed through a separate submodule (CNN or MLP, depending on input shape), the output features are concatenated and fed through additional MLP network (“combined”).

Parameters:

observation_space (gym.spaces.Dict) – the observation space
cnn_output_dim (int) – Number of features to output from each CNN submodule(s). Defaults to 256 to avoid exploding network sizes.
normalized_image (bool) – Whether to assume that the image is already normalized or not (this disables dtype and bounds checks): when True, it only checks that the space is a Box and has 3 dimensions. Otherwise, it checks that it has expected dtype (uint8) and bounds (values in [0, 255]).
order_invariant_keys (Collection[str]) – the keys to group together and process by an ordering invariant feature extractor
replicated_keys (Collection[str]) – additional keys to add to the ordering invariant groups, replicating the values for each group
filter_key (str) – the key to use for masking to select items with positive values of this key
removed_keys (Collection[str]) – keys removed from the observations before concatenating with ordering invariant features
net_arch (list[int]) – the ordering invariant MLP layers sizes
activation_fn (type[nn.Module] | None) – the ordering invariant MLP activation function If not set, it defaults to torch.nn.ReLU.
reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to [torch.sum].

class OrderInvariantFlattenExtractor(observation_space: Box, order_invariant_slices: Collection[slice] = [], replicated_slices: Collection[slice] = [], filter_slice: slice | None = None, removed_slices: Collection[slice] = [], number: int = 0, net_arch: list[int] = [8], activation_fn: type[Module] | None = None, reductions: Sequence[Reduction] | None = None, use_masked_tensors: bool = False)#

Bases: BaseFeaturesExtractor

Similar to OrderInvariantCombinedExtractor but for flat observation spaces.

Parameters:

observation_space (gym.spaces.Box) – the observation space
order_invariant_slices (Collection[slice]) – the slices to group together and process by an ordering invariant feature extractor
replicated_slices (Collection[slice]) – additional slices to add to the ordering invariant groups, replicating the values for each group
filter_slice (slice | None) – the slice to use for masking to select items with positive values for indices in this slice
removed_slices (Collection[slice]) – keys removed from the observations before concatenating with ordering invariant features
net_arch (list[int]) – the ordering invariant MLP layers sizes
activation_fn (type[nn.Module] | None) – the ordering invariant MLP activation function If not set, it defaults to torch.nn.ReLU.
reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to [torch.sum].
use_masked_tensors (bool) – Whether to use masked tensors
number (int)

make_order_invariant_flatten_extractor(observation_space: Box, dict_space: Dict, cnn_output_dim: int = 256, normalized_image: bool = False, order_invariant_keys: Collection[str] = [], replicated_keys: Collection[str] = [], filter_key: str = '', removed_keys: Collection[str] = [], net_arch: list[int] = [8], activation_fn: type[Module] | None = None, reductions: Sequence[Reduction] | None = None, use_masked_tensors: bool = False) → OrderInvariantFlattenExtractor#

Helper function that creates a OrderInvariantFlattenExtractor using information from a dictionary space to infer the layout of the observation space.

Parameters:

observation_space (Box) – the observation space
dict_space (Dict) – The dictionary space
cnn_output_dim (int) – Number of features to output from each CNN submodule(s). Defaults to 256 to avoid exploding network sizes.
normalized_image (bool) – Whether to assume that the image is already normalized or not (this disables dtype and bounds checks): when True, it only checks that the space is a Box and has 3 dimensions. Otherwise, it checks that it has expected dtype (uint8) and bounds (values in [0, 255]).
order_invariant_keys (Collection[str]) – the keys to group together and process by an ordering invariant feature extractor
replicated_keys (Collection[str]) – additional keys to add to the ordering invariant groups, replicating the values for each group
filter_key (str) – the key to use for masking to select items with positive values of this key
removed_keys (Collection[str]) – keys removed from the observations before concatenating with ordering invariant features
net_arch (list[int]) – the ordering invariant MLP layers sizes
activation_fn (type[Module] | None) – the ordering invariant MLP activation function If not set, it defaults to torch.nn.ReLU.
reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to [torch.sum].
use_masked_tensors (bool) – Whether to use masked tensors

Returns:

The order invariant flatten extractor.

Raises:

AssertionError – if filter_key is associated to a space that is non-flat.

Return type:

OrderInvariantFlattenExtractor

Centralized training with communication (SAC)#

class SACPolicyWithComm(observation_space: gym.Space, action_space: gym.spaces.Box, lr_schedule: Schedule, net_arch: list[int] | dict[str, list[int]] | None = None, activation_fn: type[nn.Module] = <class 'torch.nn.modules.activation.ReLU'>, use_sde: bool = False, log_std_init: float = -3, use_expln: bool = False, clip_mean: float = 2.0, features_extractor_class: type[BaseFeaturesExtractor] | None = None, features_extractor_kwargs: dict[str, Any] | None = None, normalize_images: bool = True, optimizer_class: type[th.optim.Optimizer] = <class 'torch.optim.adam.Adam'>, optimizer_kwargs: dict[str, Any] | None = None, n_critics: int = 2, share_features_extractor: bool = False, comm_space: gym.spaces.Box = Box(-1.0, 1.0, (1,), float32), comm_net_arch: list[int] = [32, 32])#

Bases: SACPolicy

This class describes a SB3 Sac policy that works on a stacked multi-agent environments. (navground.learning.parallel_env.JointEnv)

The actor is composed of two modules: CommNet and ActionNet. CommNet takes stacked (i.e. batched) single agent observations and computes a message (for each agent). ActionNet takes stacked (i.e. batched) single agent observations and all other agents messages and computes actions.

Communication never exit the policy during training. During inference, we can evaluate the two sub-networks separately and explicitly share the messages, using DistributedCommPolicy.

The class supports composed and simple observations, selecting the corresponding features extractor at initialization.

User can configure the CommNet by including fields

comm_space: gym.space.Box
comm_net_arch: list[int]

in policy_kwargs.

Parameters:

observation_space (gym.Space)
action_space (gym.spaces.Box)
lr_schedule (Schedule)
net_arch (list[int] | dict[str, list[int]] | None)
activation_fn (type[nn.Module])
use_sde (bool)
log_std_init (float)
use_expln (bool)
clip_mean (float)
features_extractor_class (type[BaseFeaturesExtractor] | None)
features_extractor_kwargs (dict[str, Any] | None)
normalize_images (bool)
optimizer_class (type[th.optim.Optimizer])
optimizer_kwargs (dict[str, Any] | None)
n_critics (int)
share_features_extractor (bool)
comm_space (gym.spaces.Box)
comm_net_arch (list[int])

class DistributedCommPolicy(observation_space: Dict | Box, action_space: Box, policy: SACPolicyWithComm, **kwargs: Any)#

Bases: BasePolicy

This class converts a SACPolicyWithComm centralized policy in a distributed policy, evaluating the two sub-networks to compute action and outgoing message, which are returned concatenated.

Parameters:

observation_space (gym.spaces.Dict | gym.spaces.Box)
action_space (gym.spaces.Box)
policy (SACPolicyWithComm)
kwargs (Any)

Split MLP Policy (SAC)#

type InputSpec = slice | Collection[str] | None#: Which inputs to use: a slice of a box observation space, a collection of keys of a dict observation or all (for None).

type NetArch = list[int] | dict[str, list[int]] | None#: An (optional) network architecture

type ActorSpec = tuple[int, InputSpec, NetArch]#: The specifics of a sub-module: output size, input specs and network architecture.

class SplitSACPolicy(observation_space: gym.Space, action_space: gym.spaces.Box, lr_schedule: Schedule, net_arch: list[int] | dict[str, list[int]] | None = None, activation_fn: type[nn.Module] = <class 'torch.nn.modules.activation.ReLU'>, use_sde: bool = False, log_std_init: float = -3, use_expln: bool = False, clip_mean: float = 2.0, features_extractor_class: type[BaseFeaturesExtractor] | None = None, features_extractor_kwargs: dict[str, Any] | None = None, normalize_images: bool = True, optimizer_class: type[th.optim.Optimizer] = <class 'torch.optim.adam.Adam'>, optimizer_kwargs: dict[str, Any] | None = None, n_critics: int = 2, share_features_extractor: bool = False, actor_specs: list[ActorSpec] = [])#

Bases: SACPolicy

A SAC policy whose actor contains independent MLP that outputs some of the actions.

That is, instead of the monolithic actor that takes all observations and computes all actions, the actor of this policy can use subset of the observations to compute some of the actions.

The sub-modules are configured from argument actor_specs in policy_kwargs of type list[ActorSpec].

The different tuples (action_size, input_spec, net_arch) each configure one of the MLPs that computes action_size actions using observations specified by``input_spec`` with an architecture net_arch.

The list should be ordered and actions sizes should sum up to the total size of the action space.

For example:

>>> env.observation_space
Dict('a': Box(0.0, 1.0, (1,), float32), 'b': Box(0.0, 1.0, (1,), float32))
>>> env.action_space
Box(-1.0, 1.0, (2,), float32)
>>> actor_specs = [(1, None, [64, 64]), (1, ['a'], [16, 16])]
>>> model = SAC(env, policy_kwargs={'actor_specs': actor_specs})

creates a model with a policy that computes two actions:

the first using a 64 + 64 MLP from a and b
the second using a 16 + 16 MLP solely from a

Parameters:

observation_space (gym.Space)
action_space (gym.spaces.Box)
lr_schedule (Schedule)
net_arch (list[int] | dict[str, list[int]] | None)
activation_fn (type[nn.Module])
use_sde (bool)
log_std_init (float)
use_expln (bool)
clip_mean (float)
features_extractor_class (type[BaseFeaturesExtractor] | None)
features_extractor_kwargs (dict[str, Any] | None)
normalize_images (bool)
optimizer_class (type[th.optim.Optimizer])
optimizer_kwargs (dict[str, Any] | None)
n_critics (int)
share_features_extractor (bool)
actor_specs (list[tuple[int, slice | Collection[str] | None, list[int] | dict[str, list[int]] | None]])

class AlternateActorCallback(learning_rates: list[tuple[int, Schedule | float]], exclusive: bool = True, verbose: int = 0)#

Bases: BaseCallback

A callback that alternates training modules that compose the actor of SplitSACPolicy.

Parameters:

learning_rates (list[tuple[int, Schedule | float]])
exclusive (bool)
verbose (int)

Policies

Contents

Policies#

Null#

Random#

Info#

Ordering-invariant extractor#

Centralized training with communication (SAC)#

Split MLP Policy (SAC)#