Policies#
navground.learning.policies
Null#
Bases:
BasePolicy
This class describes a dummy policy that always returns zeros.
- Parameters:
squash_output (bool)
Bases:
object
This class describes a dummy predictor that always returns zeros and conforms to
navground.learning.types.PolicyPredictor
- Parameters:
action_space (gym.spaces.Box) – The action space
observation_space (gym.Space[Any]) – An optional observation space
Random#
Bases:
BasePolicy
This class describes a onnx-able policy that returns random actions.
- Parameters:
squash_output (bool)
Bases:
object
This class describes a predictor that returns random actions and conforms to
navground.learning.types.PolicyPredictor
- Parameters:
action_space (gym.spaces.Box) – The action space
observation_space (gym.Space[Any]) – An optional observation space
Info#
Bases:
object
A predictor that extracts navground actions from the info dictionary and conforms to
navground.learning.types.PolicyPredictorWithInfo
Ordering-invariant extractor#
Bases:
BaseFeaturesExtractor
A variation of SB3
stable_baselines3.common.torch_layers.CombinedExtractor
that applies a ordering invariant MLP feature extractor to a group of keys after optionally masking it.Same as the original
CombinedExtractor
: Combined features extractor for Dict observation spaces. Builds a features extractor for each key of the space. Input from each space is fed through a separate submodule (CNN or MLP, depending on input shape), the output features are concatenated and fed through additional MLP network (“combined”).- Parameters:
observation_space (gym.spaces.Dict) – the observation space
cnn_output_dim (int) – Number of features to output from each CNN submodule(s). Defaults to 256 to avoid exploding network sizes.
normalized_image (bool) – Whether to assume that the image is already normalized or not (this disables dtype and bounds checks): when True, it only checks that the space is a Box and has 3 dimensions. Otherwise, it checks that it has expected dtype (uint8) and bounds (values in [0, 255]).
order_invariant_keys (Collection[str]) – the keys to group together and process by an ordering invariant feature extractor
replicated_keys (Collection[str]) – additional keys to add to the ordering invariant groups, replicating the values for each group
filter_key (str) – the key to use for masking to select items with positive values of this key
removed_keys (Collection[str]) – keys removed from the observations before concatenating with ordering invariant features
net_arch (list[int]) – the ordering invariant MLP layers sizes
activation_fn (type[nn.Module] | None) – the ordering invariant MLP activation function If not set, it defaults to
torch.nn.ReLU
.reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to
[torch.sum]
.
Bases:
BaseFeaturesExtractor
Similar to
OrderInvariantCombinedExtractor
but for flat observation spaces.- Parameters:
observation_space (gym.spaces.Box) – the observation space
order_invariant_slices (Collection[slice]) – the slices to group together and process by an ordering invariant feature extractor
replicated_slices (Collection[slice]) – additional slices to add to the ordering invariant groups, replicating the values for each group
filter_slice (slice | None) – the slice to use for masking to select items with positive values for indices in this slice
removed_slices (Collection[slice]) – keys removed from the observations before concatenating with ordering invariant features
net_arch (list[int]) – the ordering invariant MLP layers sizes
activation_fn (type[nn.Module] | None) – the ordering invariant MLP activation function If not set, it defaults to
torch.nn.ReLU
.reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to
[torch.sum]
.use_masked_tensors (bool) – Whether to use masked tensors
number (int)
Helper function that creates a
OrderInvariantFlattenExtractor
using information from a dictionary space to infer the layout of the observation space.- Parameters:
observation_space (Box) – the observation space
dict_space (Dict) – The dictionary space
cnn_output_dim (int) – Number of features to output from each CNN submodule(s). Defaults to 256 to avoid exploding network sizes.
normalized_image (bool) – Whether to assume that the image is already normalized or not (this disables dtype and bounds checks): when True, it only checks that the space is a Box and has 3 dimensions. Otherwise, it checks that it has expected dtype (uint8) and bounds (values in [0, 255]).
order_invariant_keys (Collection[str]) – the keys to group together and process by an ordering invariant feature extractor
replicated_keys (Collection[str]) – additional keys to add to the ordering invariant groups, replicating the values for each group
filter_key (str) – the key to use for masking to select items with positive values of this key
removed_keys (Collection[str]) – keys removed from the observations before concatenating with ordering invariant features
net_arch (list[int]) – the ordering invariant MLP layers sizes
activation_fn (type[Module] | None) – the ordering invariant MLP activation function If not set, it defaults to
torch.nn.ReLU
.reductions (Sequence[Reduction] | None) – A sequence of (order-invariant) modules. If not set, it defaults to
[torch.sum]
.use_masked_tensors (bool) – Whether to use masked tensors
- Returns:
The order invariant flatten extractor.
- Raises:
AssertionError – if
filter_key
is associated to a space that is non-flat.- Return type:
Centralized training with communication (SAC)#
Bases:
SACPolicy
This class describes a SB3 Sac policy that works on a stacked multi-agent environments. (
navground.learning.parallel_env.JointEnv
)The actor is composed of two modules: CommNet and ActionNet. CommNet takes stacked (i.e. batched) single agent observations and computes a message (for each agent). ActionNet takes stacked (i.e. batched) single agent observations and all other agents messages and computes actions.
Communication never exit the policy during training. During inference, we can evaluate the two sub-networks separately and explicitly share the messages, using
DistributedCommPolicy
.The class supports composed and simple observations, selecting the corresponding features extractor at initialization.
User can configure the CommNet by including fields
comm_space: gym.space.Box
comm_net_arch: list[int]
in
policy_kwargs
.- Parameters:
observation_space (gym.Space)
action_space (gym.spaces.Box)
lr_schedule (Schedule)
use_sde (bool)
log_std_init (float)
use_expln (bool)
clip_mean (float)
features_extractor_class (type[BaseFeaturesExtractor] | None)
normalize_images (bool)
optimizer_class (type[th.optim.Optimizer])
n_critics (int)
share_features_extractor (bool)
comm_space (gym.spaces.Box)
Bases:
BasePolicy
This class converts a
SACPolicyWithComm
centralized policy in a distributed policy, evaluating the two sub-networks to compute action and outgoing message, which are returned concatenated.- Parameters:
observation_space (gym.spaces.Dict | gym.spaces.Box)
action_space (gym.spaces.Box)
policy (SACPolicyWithComm)
kwargs (Any)
Split MLP Policy (SAC)#
Which inputs to use: a slice of a box observation space, a collection of keys of a dict observation or all (for None).
An (optional) network architecture
The specifics of a sub-module: output size, input specs and network architecture.
Bases:
SACPolicy
A SAC policy whose actor contains independent MLP that outputs some of the actions.
That is, instead of the monolithic actor that takes all observations and computes all actions, the actor of this policy can use subset of the observations to compute some of the actions.
The sub-modules are configured from argument
actor_specs
inpolicy_kwargs
of typelist[ActorSpec]
.The different tuples
(action_size, input_spec, net_arch)
each configure one of the MLPs that computesaction_size
actions using observations specified by``input_spec`` with an architecturenet_arch
.The list should be ordered and actions sizes should sum up to the total size of the action space.
For example:
>>> env.observation_space Dict('a': Box(0.0, 1.0, (1,), float32), 'b': Box(0.0, 1.0, (1,), float32)) >>> env.action_space Box(-1.0, 1.0, (2,), float32) >>> actor_specs = [(1, None, [64, 64]), (1, ['a'], [16, 16])] >>> model = SAC(env, policy_kwargs={'actor_specs': actor_specs})
creates a model with a policy that computes two actions:
the first using a 64 + 64 MLP from
a
andb
the second using a 16 + 16 MLP solely from
a
- Parameters:
observation_space (gym.Space)
action_space (gym.spaces.Box)
lr_schedule (Schedule)
use_sde (bool)
log_std_init (float)
use_expln (bool)
clip_mean (float)
features_extractor_class (type[BaseFeaturesExtractor] | None)
normalize_images (bool)
optimizer_class (type[th.optim.Optimizer])
n_critics (int)
share_features_extractor (bool)
actor_specs (list[tuple[int, slice | Collection[str] | None, list[int] | dict[str, list[int]] | None]])
Bases:
BaseCallback
A callback that alternates training modules that compose the actor of
SplitSACPolicy
.