Introduction#

This packages contains tools to train and use Machine Learning models inside a navground simulation.

Train ML policies in navground#

Note

Have a look at the tutorials to see the interaction between gymnasium and navground in action and how to use it to train a navigation policy using IL or RL.

Imitation Learning#

Using the navground environments, we can train a policy that imitates one of the navigation behaviors implemented in navground, using any of the available sensors.

We include helper classes that wraps the Python package imitation by the Center for Human-Compatible AI to offer simplified interface, yet nothing prevent to use the original API.

To learn to imitate a behavior, we can run

import gymnasium as gym
import navground.learning.env
from navground.learning.il import BC, DAgger

env = gym.make("navground", scenario=..., sensor=...,
               observation_config=..., action_config=...,
               max_episode_steps=100)

# Behavior cloning
bc = BC(env=env, runs=100)
bc.learn(n_epochs=1)
bc.save("BC")

# DAgger
dagger = DAgger(env=env)
dagger.learn(total_timesteps=10_000,
             rollout_round_min_timesteps=100)
dagger.save("DAgger")

Reinforcement Learning#

Using the navground-gymnasium environment, we can train a policy to navigate among other agents controlled by navground, for instance using the RL algorithm implemented in Stable-Baselines3 by DLR-RM.

import gymnasium as gym
import navground.learning.env
from stable_baselines3 import SAC

env = gym.make("navground", scenario=..., sensor=...,
               observation_config=..., action_config=...,
               max_episode_steps=100)
sac = SAC("MlpPolicy", env)
sac.learn(total_timesteps=10_000)
sac.save("SAC")

Parallel Multi-agent Learning#

Using the multi-agent navground-gymnasium environment, we can train a policy in parallel for all agents in the environment, that is, the agents learn to navigate among peers that are learning the same policy. We instantiate the parallel environment using parallel_env.shared_parallel_env(), and transform it to a Stable-Baseline compatible (sigle-agent) vectorized environment using parallel_env.make_vec_from_penv(). While learning, from the view-point of the SAC algorithm, rollouts will generate by a single agent in n environments that compose venv, while in reality they will be generate in a single penv by n agents.

import gymnasium as gym
from navground.learning.parallel_env import make_vec_from_penv, shared_parallel_env
from stable_baselines3 import SAC

penv = shared_parallel_env(scenario=..., sensor=...,
                           observation_config=..., action_config=...,
                           max_episode_steps=100)
venv = make_vec_from_penv(penv)
psac = SAC("MlpPolicy", venv)
psac_ma.learn(total_timesteps=10_000)
psac.save("PSAC")

Evaluation#

Once trained, we can evaluate the policies with common tools, such as stable_baselines3.common.evaluation.evaluate_policy() and its extensions in evaluation that supports parallel environments with groups using different policies.

Use ML policies in navground#

Evaluation can also be performed using the tools available in navground, which are specifically designed to support large experiments with many runs and agents, distributing the work over multiple processor if desired.

Once we have trained a policy (and possibly exported it to onnx using onnx.export()), behaviors.PolicyBehavior executes it as a navigation behavior in navground. As a basic example, we can load it and assign it to some of the agents in the simulation:

import navground as sim
import gymnasium as gym
from navground.learning.behaviors import PolicyBehavior

# we load the same scenario and sensor used to train the policy
scenario = sim.Scenario.load(...)
sensor = sim.Sensor.load(...)
world = scenario.make_world(seed=1)

# and configure the first five agents to use the policy
# instead of the original behavior
for agent in world.agents[:5]:
   agent.behavior = PolicyBehavior.clone_behavior(
      agent.behavior, policy='policy.onnx',
      action_config=..., observation_config=...)
   agent.state_estimation = sensor

world.run(time_step=0.1, steps=1000)

In practice, we do not need to perform the configuration manually. Instead, we can load it from a YAML file (exported e.g. using io.export_policy_as_behavior()), like common in navground:

scenario.yaml#
groups:
  - number: 5
    behavior:
      type: PolicyBehavior
      policy_path: policy.onnx
      # action and observation config
      ...
    state_estimation:
      # sensor config
      ...
    # remaining of the agents config
    ...

When loaded, the 5 agents in this group will use the policy to navigate

import navground as sim

# loads the navground.learning components such as PolicyBehavior
sim.load_plugins()

with open('scenario.yaml') as f:
   scenario = sim.Scenario.load(f.read())

world = scenario.make_world(seed=1)
world.run(time_step=0.1, steps=1000)

or we could embed it in an experiment to record trajectories and performance metrics:

experiment.yaml#
 runs: 1000
 time_step: 0.1
 steps: 10000
 record_pose: true
 record_efficacy: true
 scenario:
     groups:
       - number: 5
         behavior:
           type: PolicyBehavior
           policy_path: policy.onnx
         ...

Acknowledgement and disclaimer#

The work was supported in part by REXASI-PRO H-EU project, call HORIZON-CL4-2021-HUMAN-01-01, Grant agreement no. 101070028.

REXASI-PRO logo

The work has been partially funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them.