Model-based behaviors#

In this notebook we measure the performance of 4 model-based navigation behaviors:

Dummy, which ignores the pad and the other agent.

[1]:

from navground import sim, core
from navground.sim.ui.video import display_video
from navground.learning.behaviors.pad import DistributedPadBehavior, CentralizedPadBehavior, StopAtPadBehavior
from navground.learning.scenarios.pad import PadScenario, render_kwargs

scenario = sim.load_scenario("""
type: Pad
start_in_opposite_sides: true
groups:
  - number: 2
    behavior:
      type: Dummy
""")

world = scenario.make_world(seed=2)
display_video(world, time_step=0.1, duration=10, factor=4, **render_kwargs())

[1]:

StopAtPad, which stop nearby the pad without crossing it.

[2]:

scenario = sim.load_scenario("""
type: Pad
start_in_opposite_sides: true
groups:
  - number: 2
    behavior:
      type: StopAtPad
""")

world = scenario.make_world(seed=2)
display_video(world, time_step=0.1, duration=10, factor=4, **render_kwargs())

[2]:

Pad, a distributed behavior that tries to solve the problem optimally by tuning the speed to avoid conflicts and gives precedence to the other agent if it would exit first from the pad. The agent does not know the other agent desired speed, which it infer from its current speed.

[3]:

scenario = sim.load_scenario("""
type: Pad
start_in_opposite_sides: true
groups:
  - number: 2
    behavior:
      type: Pad
    state_estimation:
      type: Bounded
      range: 10
""")

world = scenario.make_world(seed=2)
display_video(world, time_step=0.1, duration=10, factor=4, **render_kwargs())

[3]:

CentralizedPad, a centralized behavior which is similar to Pad but knows everything about both agents and control them at the same time.

[4]:

from navground import sim
from navground.sim.ui.video import display_video

scenario = sim.load_scenario("""
type: Pad
start_in_opposite_sides: true
groups:
  - number: 2
    behavior:
      type: CentralizedPad
""")

world = scenario.make_world(seed=2)
world._prepare()
video = display_video(world, time_step=0.1, duration=10, factor=4, **render_kwargs())
world._close()
video

[4]:

Reward starting from opposing sides#

We compute the reward distribution when the agents start from oppositing sides of the map. The reward function is composed by two terms:

1 - efficacy (i.e., 0 if moving forwards at full desired speed)
pad_penalty when two agents are on the pad at the same time.

here, we compute the reward for pad_penalty=1, which we will later use to pick a penalty that balance the tendency to stop and cross the pad.

[5]:

from navground.learning.examples.pad import PadReward

reward = PadReward(pad_penalty=1)

[6]:

from navground.learning.probes import RewardProbe
from navground.learning.probes.success import SuccessProbe
from navground.learning.examples.pad import is_success, is_failure

scenario = sim.load_scenario("""
type: Pad
groups:
  - number:
      sampler: sequence
      values: [2, 0, 0]
      once: true
    behavior:
      type: Pad
    state_estimation:
      type: Bounded
      range: 10
  - number:
      sampler: sequence
      values: [0, 2, 0]
      once: true
    behavior:
      type: StopAtPad
  - number:
      sampler: sequence
      values: [0, 0, 2]
      once: true
    behavior:
      type: Dummy
start_in_opposite_sides: true
""")

exp = sim.Experiment(time_step=0.1, steps=200)
exp.scenario = scenario
exp.add_record_probe("reward", RewardProbe.with_reward(reward=reward))
exp.add_record_probe("success", SuccessProbe.with_criteria(is_success=is_success, is_failure=is_failure))
exp.terminate_when_all_idle_or_stuck = False

We perform 2000 runs for each behavior, which is shared by the two agents.

[7]:

exp.run(number_of_runs=6 * 1000)

[11]:

import numpy as np
import pandas as pd

rewards = np.asarray([np.sum(run.records['reward']) / 2 for run in exp.runs.values()]).reshape(-1, 3)
success = np.asarray([np.mean(np.asarray(run.records['success']) > 0) for run in exp.runs.values()]).reshape(-1, 3)
steps = np.asarray([run.recorded_steps for run in exp.runs.values()]).reshape(-1, 3)

df = pd.DataFrame()
for i, name in enumerate(('Pad', 'Stop', 'Dummy')):
    df[name] = {'reward mean': np.mean(rewards[:, i]),
                'reward std': np.std(rewards[:, i]),
                'success rate': np.mean(success[:, i]),
                'length mean': np.mean(steps[:, i]),
                'length std': np.std(steps[:, i])}

df.to_csv('behaviors.csv')

[12]:

pd.set_option("display.precision", 2)
df.T

[12]:

	reward mean	reward std	success rate	length mean	length std
Pad	-10.47	5.80	1.00	145.65	11.81
Stop	-172.76	10.71	0.00	200.00	0.00
Dummy	-19.77	11.46	0.14	126.53	12.63

[13]:

from matplotlib import pyplot as plt

for rs, name in zip(rewards.T, ('Pad', 'StopAtPad', 'Dummy')):
    plt.hist(rs, density=True, label=name, histtype='step')
plt.title("Reward distribution starting from opposing sides")
plt.legend();

../../_images/tutorials_pad_Behaviors_15_0.png

[14]:

easy = rewards[:, -1] >= -1.1
easy_prob = np.sum(easy, axis=0) / len(easy)
print(f'about {easy_prob:.0%} of the cases requires no coordination')

about 11% of the cases requires no coordination

Penality for efficiency should make Dummy and StopAtPad almost even:

[15]:

ms = np.mean(rewards, axis=0)
print(f'Penality: {round(ms[1] / ms[2], 2)}')

Penality: 8.74

Reward starting uniformly along the corridor#

We also compute the reward distribution when the agents start uniformly along the corridor.

[16]:

scenario = sim.load_scenario("""
type: Pad
groups:
  - number:
      sampler: sequence
      values: [2, 0, 0]
      once: true
    behavior:
      type: Pad
    state_estimation:
      type: Bounded
      range: 10
  - number:
      sampler: sequence
      values: [0, 2, 0]
      once: true
    behavior:
      type: StopAtPad
  - number:
      sampler: sequence
      values: [0, 0, 2]
      once: true
    behavior:
      type: Dummy
start_in_opposite_sides: false
""")

exp = sim.Experiment(time_step=0.1, steps=200)
exp.scenario = scenario
exp.add_record_probe("reward", RewardProbe.with_reward(reward=reward))
exp.add_record_probe("success", SuccessProbe.with_criteria(is_success=is_success, is_failure=is_failure))
exp.terminate_when_all_idle_or_stuck = False

[17]:

exp.run(number_of_runs=6 * 1000)

[18]:

rewards = np.asarray([np.sum(run.records['reward']) / 2 for run in exp.runs.values()]).reshape(-1, 3)
success = np.asarray([np.mean(np.asarray(run.records['success']) > 0) for run in exp.runs.values()]).reshape(-1, 3)
steps = np.asarray([run.recorded_steps for run in exp.runs.values()]).reshape(-1, 3)

df = pd.DataFrame()
for i, name in enumerate(('Pad', 'Stop', 'Dummy')):
    df[name] = {'reward mean': np.mean(rewards[:, i]),
                'reward std': np.std(rewards[:, i]),
                'success rate': np.mean(success[:, i]),
                'length mean': np.mean(steps[:, i]),
                'length std': np.std(steps[:, i])}

df.to_csv('behaviors_same_side.csv')

[19]:

pd.set_option("display.precision", 2)
df.T

[19]:

	reward mean	reward std	success rate	length mean	length std
Pad	-3.81776	5.53239	0.94150	101.324	37.04168
Stop	-66.44618	58.58046	0.56125	145.944	68.60380
Dummy	-5.12508	8.65664	0.76000	96.188	34.11926

[20]:

for rs, name in zip(rewards.T, ('Pad', 'StopAtPad', 'Dummy')):
    plt.hist(rs, density=True, label=name, histtype='step')
plt.title("Reward distribution starting uniformly along the corridor")
plt.legend();

../../_images/tutorials_pad_Behaviors_24_0.png

[21]:

easy = rewards[:, -1] >= -1.1
easy_prob = np.sum(easy, axis=0) / len(easy)
print(f'about {easy_prob:.0%} of the cases requires no coordination')

about 75% of the cases requires no coordination

Penality for efficiency should make Dummy and StopAtPad almost even:

[22]:

ms = np.mean(rewards, axis=0)
print(f'Penality: {round(ms[1] / ms[2], 2)}')

Penality: 12.96

[ ]:

Model-based behaviors

Contents

Model-based behaviors#

Reward starting from opposing sides#

Reward starting uniformly along the corridor#