Crossing#

In these notebooks, we take a look at a more challenging scenario where we learn to navigate among many agents. For the same scenario

type: Cross
agent_margin: 0.1
side: 4
target_margin: 0.1
tolerance: 0.5
groups:
  -
    type: thymio
    number: 20
    radius: 0.1
    control_period: 0.1
    speed_tolerance: 0.02
    color: gray
    kinematics:
      type: 2WDiff
      wheel_axis: 0.094
      max_speed: 0.12
    behavior:
      type: HL
      optimal_speed: 0.12
      horizon: 5.0
      tau: 0.25
      eta: 0.5
      safety_margin: 0.05
    state_estimation:
      type: Bounded
      range: 5.0

and sensor

type: Discs
number: 5
range: 5.0
max_speed: 0.12
max_radius: 0

we try different algorithms to learn a navigation policy. In particular, we make use of the parallel multi-agent environment to make all agents in the group learn a policy in parallel.

Training one agent among many agents
Performance of policies trained in single-agent environment
- One agent following the policy
- More agents following the policy
Training agents among peers
Performance of policies trained in multi-agent environment
- Video
- Reward
- Final video