Types#

navground.learning.types

A generic numpy array.

The type of observations generated by an environment.

The type of actions accepted by an environment.

The type of states

The type of the episode_start argument accepted by policies.

The type of info generated by an environment.

Anything that can be accepted as a filesystem path.

A rectangular region defined by bottom-left and top-right vertices.

Anything that can be converted to Indices.

The reward protocol is a callable to compute scalar rewards for individual agents.

Compute the reward for an agent.

Parameters:
  • agent (Agent) – The agent

  • world (World) – The simulated world the agent belongs to

  • time_step (float) – The time step of the simulation

Returns:

A scalar reward number

Return type:

float

This class describes the predictor protocol.

Same as stable_baselines3.common.type_aliases.PolicyPredictor, included here to be self-contained.

Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).

Parameters:
  • observation (Observation) – the input observation

  • state (State | None) – The last hidden states (can be None, used in recurrent policies)

  • episode_start (EpisodeStart | None) – The last masks (can be None, used in recurrent policies) this correspond to beginning of episodes, where the hidden states of the RNN must be reset.

  • deterministic (bool) – Whether or not to return deterministic actions.

Returns:

the model’s action and the next hidden state (used in recurrent policies)

Return type:

tuple[Action, State | None]

Similar to PolicyPredictor but predict() accepts info dictionaries.

Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).

Parameters:
  • observation (Observation) – the input observation

  • state (State | None) – The last hidden states (can be None, used in recurrent policies)

  • episode_start (EpisodeStart | None) – The last masks (can be None, used in recurrent policies) this correspond to beginning of episodes, where the hidden states of the RNN must be reset.

  • deterministic (bool) – Whether or not to return deterministic actions.

  • info (Info | None) – Dictionaries with generic information that is not part of observation and state.

Returns:

the model’s action and the next hidden state (used in recurrent policies)

Return type:

tuple[Action, State | None]

Check whether the callable accept an info argument.

Can be used to distinguish PolicyPredictorWithInfo from PolicyPredictor:

>>> policy = MyPolicy(...)
>>> accept_info(policy.predict)
>>> False
Parameters:

func (Callable[[...], Any]) – The function to test

Returns:

True if it has an argument named info

Return type:

bool