Types#
navground.learning.types
The type of observations generated by an environment.
The type of actions accepted by an environment.
The type of states
The type of the episode_start argument accepted by policies.
The type of info generated by an environment.
Anything that can be accepted as a filesystem path.
A rectangular region defined by bottom-left and top-right vertices.
Anything that can be converted to
Indices
.
The reward protocol is a callable to compute scalar rewards for individual agents.
This class describes the predictor protocol.
Same as
stable_baselines3.common.type_aliases.PolicyPredictor
, included here to be self-contained.Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).
- Parameters:
observation (Observation) – the input observation
state (State | None) – The last hidden states (can be None, used in recurrent policies)
episode_start (EpisodeStart | None) – The last masks (can be None, used in recurrent policies) this correspond to beginning of episodes, where the hidden states of the RNN must be reset.
deterministic (bool) – Whether or not to return deterministic actions.
- Returns:
the model’s action and the next hidden state (used in recurrent policies)
- Return type:
Similar to
PolicyPredictor
butpredict()
accepts info dictionaries.Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).
- Parameters:
observation (Observation) – the input observation
state (State | None) – The last hidden states (can be None, used in recurrent policies)
episode_start (EpisodeStart | None) – The last masks (can be None, used in recurrent policies) this correspond to beginning of episodes, where the hidden states of the RNN must be reset.
deterministic (bool) – Whether or not to return deterministic actions.
info (Info | None) – Dictionaries with generic information that is not part of observation and state.
- Returns:
the model’s action and the next hidden state (used in recurrent policies)
- Return type:
Check whether the callable accept an
info
argument.Can be used to distinguish
PolicyPredictorWithInfo
fromPolicyPredictor
:>>> policy = MyPolicy(...) >>> accept_info(policy.predict) >>> False