Policies#

navground.learning.policies

Null#

Bases: BasePolicy

This class describes a dummy policy that always returns zeros.

Parameters:

squash_output (bool)

Bases: object

This class describes a dummy predictor that always returns zeros and conforms to navground.learning.types.PolicyPredictor

Parameters:
  • action_space (gym.spaces.Box) – The action space

  • observation_space (gym.Space[Any]) – An optional observation space

Random#

Bases: BasePolicy

This class describes a onnx-able policy that returns random actions.

Parameters:

squash_output (bool)

Bases: object

This class describes a predictor that returns random actions and conforms to navground.learning.types.PolicyPredictor

Parameters:
  • action_space (gym.spaces.Box) – The action space

  • observation_space (gym.Space[Any]) – An optional observation space

Info#

Bases: object

A predictor that extracts navground actions from the info dictionary and conforms to navground.learning.types.PolicyPredictorWithInfo

Parameters:
  • action_space (gym.Space[Any]) – The action space

  • key (str) – The key of the action in the info dictionary

  • observation_space (gym.Space[Any]) – An optional observation space

Ordering-invariant extractor#

Bases: BaseFeaturesExtractor

A variation of SB3 stable_baselines3.common.torch_layers.CombinedExtractor that applies a ordering invariant MLP feature extractor to a group of keys after optionally masking it.

Same as the original CombinedExtractor: Combined features extractor for Dict observation spaces. Builds a features extractor for each key of the space. Input from each space is fed through a separate submodule (CNN or MLP, depending on input shape), the output features are concatenated and fed through additional MLP network (“combined”).

Parameters:
  • observation_space (gym.spaces.Dict) – the observation space

  • cnn_output_dim (int) – Number of features to output from each CNN submodule(s). Defaults to 256 to avoid exploding network sizes.

  • normalized_image (bool) – Whether to assume that the image is already normalized or not (this disables dtype and bounds checks): when True, it only checks that the space is a Box and has 3 dimensions. Otherwise, it checks that it has expected dtype (uint8) and bounds (values in [0, 255]).

  • order_invariant_keys (Collection[str]) – the keys to group together and process by an ordering invariant feature extractor

  • replicated_keys (Collection[str]) – additional keys to add to the ordering invariant groups, replicating the values for each group

  • filter_key (str) – the key to use for masking to select items with positive values of this key

  • removed_keys (Collection[str]) – keys removed from the observations before concatenating with ordering invariant features

  • net_arch (list[int]) – the ordering invariant MLP layers sizes

  • activation_fn (type[nn.Module] | None) – the ordering invariant MLP activation function If not set, it defaults to torch.nn.ReLU.

  • reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to [torch.sum].

Bases: BaseFeaturesExtractor

Similar to OrderInvariantCombinedExtractor but for flat observation spaces.

Parameters:
  • observation_space (gym.spaces.Box) – the observation space

  • order_invariant_slices (Collection[slice]) – the slices to group together and process by an ordering invariant feature extractor

  • replicated_slices (Collection[slice]) – additional slices to add to the ordering invariant groups, replicating the values for each group

  • filter_slice (slice | None) – the slice to use for masking to select items with positive values for indices in this slice

  • removed_slices (Collection[slice]) – keys removed from the observations before concatenating with ordering invariant features

  • net_arch (list[int]) – the ordering invariant MLP layers sizes

  • activation_fn (type[nn.Module] | None) – the ordering invariant MLP activation function If not set, it defaults to torch.nn.ReLU.

  • reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to [torch.sum].

  • use_masked_tensors (bool) – Whether to use masked tensors

  • number (int)

Helper function that creates a OrderInvariantFlattenExtractor using information from a dictionary space to infer the layout of the observation space.

Parameters:
  • observation_space (Box) – the observation space

  • dict_space (Dict) – The dictionary space

  • cnn_output_dim (int) – Number of features to output from each CNN submodule(s). Defaults to 256 to avoid exploding network sizes.

  • normalized_image (bool) – Whether to assume that the image is already normalized or not (this disables dtype and bounds checks): when True, it only checks that the space is a Box and has 3 dimensions. Otherwise, it checks that it has expected dtype (uint8) and bounds (values in [0, 255]).

  • order_invariant_keys (Collection[str]) – the keys to group together and process by an ordering invariant feature extractor

  • replicated_keys (Collection[str]) – additional keys to add to the ordering invariant groups, replicating the values for each group

  • filter_key (str) – the key to use for masking to select items with positive values of this key

  • removed_keys (Collection[str]) – keys removed from the observations before concatenating with ordering invariant features

  • net_arch (list[int]) – the ordering invariant MLP layers sizes

  • activation_fn (type[Module] | None) – the ordering invariant MLP activation function If not set, it defaults to torch.nn.ReLU.

  • reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to [torch.sum].

  • use_masked_tensors (bool) – Whether to use masked tensors

Returns:

The order invariant flatten extractor.

Raises:

AssertionError – if filter_key is associated to a space that is non-flat.

Return type:

OrderInvariantFlattenExtractor

Centralized training with communication (SAC)#

Bases: SACPolicy

This class describes a SB3 Sac policy that works on a stacked multi-agent environments. (navground.learning.parallel_env.JointEnv)

The actor is composed of two modules: CommNet and ActionNet. CommNet takes stacked (i.e. batched) single agent observations and computes a message (for each agent). ActionNet takes stacked (i.e. batched) single agent observations and all other agents messages and computes actions.

Communication never exit the policy during training. During inference, we can evaluate the two sub-networks separately and explicitly share the messages, using DistributedCommPolicy.

The class supports composed and simple observations, selecting the corresponding features extractor at initialization.

User can configure the CommNet by including fields

  • comm_space: gym.space.Box

  • comm_net_arch: list[int]

in policy_kwargs.

Parameters:

Bases: BasePolicy

This class converts a SACPolicyWithComm centralized policy in a distributed policy, evaluating the two sub-networks to compute action and outgoing message, which are returned concatenated.

Parameters:

Split MLP Policy (SAC)#

Which inputs to use: a slice of a box observation space, a collection of keys of a dict observation or all (for None).

An (optional) network architecture

The specifics of a sub-module: output size, input specs and network architecture.

Bases: SACPolicy

A SAC policy whose actor contains independent MLP that outputs some of the actions.

That is, instead of the monolithic actor that takes all observations and computes all actions, the actor of this policy can use subset of the observations to compute some of the actions.

The sub-modules are configured from argument actor_specs in policy_kwargs of type list[ActorSpec].

The different tuples (action_size, input_spec, net_arch) each configure one of the MLPs that computes action_size actions using observations specified by``input_spec`` with an architecture net_arch.

The list should be ordered and actions sizes should sum up to the total size of the action space.

For example:

>>> env.observation_space
Dict('a': Box(0.0, 1.0, (1,), float32), 'b': Box(0.0, 1.0, (1,), float32))
>>> env.action_space
Box(-1.0, 1.0, (2,), float32)
>>> actor_specs = [(1, None, [64, 64]), (1, ['a'], [16, 16])]
>>> model = SAC(env, policy_kwargs={'actor_specs': actor_specs})

creates a model with a policy that computes two actions:

  • the first using a 64 + 64 MLP from a and b

  • the second using a 16 + 16 MLP solely from a

Parameters:

Bases: BaseCallback

A callback that alternates training modules that compose the actor of SplitSACPolicy.

Parameters: