Types#

navground.learning.types

type Array = numpy.typing.NDArray[Any]#: A generic numpy array.

type Observation = dict[str, Array] | Array#: The type of observations generated by an environment.

type Action = Array#: The type of actions accepted by an environment.

type State = tuple[Array, ...]#: The type of states

type EpisodeStart = Array#: The type of the episode_start argument accepted by policies.

type Info = list[dict[str, Array]] | dict[str, Array]#: The type of info generated by an environment.

type PathLike = os.PathLike[str] | str#: Anything that can be accepted as a filesystem path.

type Bounds = tuple[np.typing.NDArray[np.float64], np.typing.NDArray[np.float64]]#: A rectangular region defined by bottom-left and top-right vertices.

type IndicesLike = Indices | slice | list[int] | tuple[int] | set[int] | Literal['ALL']#: Anything that can be converted to Indices.

class Reward#

The reward protocol is a callable to compute scalar rewards for individual agents.

abstract __call__(agent: Agent, world: World, time_step: float) → float#

Compute the reward for an agent.

Parameters:

agent (Agent) – The agent
world (World) – The simulated world the agent belongs to
time_step (float) – The time step of the simulation

Returns:

A scalar reward number

Return type:

float

class PolicyPredictor(*args, **kwargs)#

This class describes the predictor protocol.

Same as stable_baselines3.common.type_aliases.PolicyPredictor, included here to be self-contained.

predict(observation: Observation, state: State | None = None, episode_start: EpisodeStart | None = None, deterministic: bool = False) → tuple[Action, State | None]#

Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).

Parameters:

observation (Observation) – the input observation
state (State | None) – The last hidden states (can be None, used in recurrent policies)
episode_start (EpisodeStart | None) – The last masks (can be None, used in recurrent policies) this correspond to beginning of episodes, where the hidden states of the RNN must be reset.
deterministic (bool) – Whether or not to return deterministic actions.

Returns:

the model’s action and the next hidden state (used in recurrent policies)

Return type:

tuple[Action, State | None]

class PolicyPredictorWithInfo(*args, **kwargs)#

Similar to PolicyPredictor but predict() accepts info dictionaries.

predict(observation: Observation, state: State | None = None, episode_start: EpisodeStart | None = None, deterministic: bool = False, info: Info | None = None) → tuple[Action, State | None]#

Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).

Parameters:

observation (Observation) – the input observation
state (State | None) – The last hidden states (can be None, used in recurrent policies)
episode_start (EpisodeStart | None) – The last masks (can be None, used in recurrent policies) this correspond to beginning of episodes, where the hidden states of the RNN must be reset.
deterministic (bool) – Whether or not to return deterministic actions.
info (Info | None) – Dictionaries with generic information that is not part of observation and state.

Returns:

the model’s action and the next hidden state (used in recurrent policies)

Return type:

tuple[Action, State | None]

type AnyPolicyPredictor = PolicyPredictor | PolicyPredictorWithInfo#

accept_info(func: Callable[[...], Any]) → bool#

Check whether the callable accept an info argument.

Can be used to distinguish PolicyPredictorWithInfo from PolicyPredictor:

>>> policy = MyPolicy(...)
>>> accept_info(policy.predict)
>>> False

Parameters:: func (Callable[[...], Any]) – The function to test
Returns:: True if it has an argument named info
Return type:: bool

Types

Contents

Types#