Scenario#

In the scenario, one agent wants to travel along a corridor with constant speed, avoiding the single obtacle. The obstacle is visible during the whole runs, the agent start at full speed towards the corridor and may start changing trajectory after the first control update.

The initial lateral position of agent and obstacle are picked randomly with the constraint that passing is always possible, i.e. that there is a gap of at least the agent size enlarged by its safety margin.

[1]:

%%writefile scenario.yaml
type: CorridorWithObstacle
length: 1.0
width: 1.0
min_radius: 0.2
max_radius: 0.2
groups:
  -
    type: thymio
    number: 1
    radius: 0.08
    control_period: 0.05
    color: firebrick
    kinematics:
      type: 2WDiff
      wheel_axis: 0.094
      max_speed: 0.12
    behavior:
      type: HL
      optimal_speed: 0.12
      horizon: 10
      tau: 0.25
      eta: 0.5
      safety_margin: 0.05
      barrier_angle: 1.0
    state_estimation:
      type: Bounded
      range: 1.0
      update_static_obstacles: true

Overwriting scenario.yaml

[2]:

import warnings
warnings.filterwarnings("ignore")

%config InlineBackend.figure_formats = ['svg']

[3]:

import numpy as np
from navground import core, sim
import navground.learning.scenarios

with open('scenario.yaml') as f:
    scenario = sim.load_scenario(f.read())

[4]:

from navground.sim.notebook import display_in_notebook

world = sim.World()
scenario.init_world(world)

display_in_notebook(world, width=360)

[4]:

../../_images/tutorials_corridor_with_obstacle_Scenario_5_0.svg

[5]:

from navground.sim.ui.video import display_video

display_video(world=world, time_step=0.1, duration=20.0, factor=5.0, display_width=360)

[5]:

A typical run takes very little to compute

[6]:

%%timeit
world = sim.World()
scenario.init_world(world)
world.run(200, 0.1)

378 μs ± 2.26 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

[7]:

world.time

[7]:

19.83333396911621

The task is simple but not trivial, even for the rule-based behavior like HL. In fact, sometimes, the gap on one side is almost good for the agent to pass safely, but not quite: it is possible that the agent is too near and does not manage to change side in time. In our scenario, HL is setup to that it will try to pass anyway (barrier_angle \(<\pi/2\)) incurring in a violation of the safety margin. This event is relatively rare, and for the majority of runs, the agent passes safely.

Let us plot few trajectories when using HL

[8]:

from matplotlib import pyplot as plt
from navground.sim.pyplot_helpers import plot_runs

exp = sim.Experiment(time_step=0.1, steps=300)
exp.scenario = scenario
exp.record_config.pose = True
exp.run(number_of_runs=12)

plot_runs(runs=exp.runs.values(), columns=4, with_agent=True,
          world_kwargs={'in_box': True, 'no_ticks': True}, step=30,
          agent_kwargs={'dot_radius': 0, 'with_safety_margin': True})

[8]:

../../_images/tutorials_corridor_with_obstacle_Scenario_11_0.svg

../../_images/tutorials_corridor_with_obstacle_Scenario_11_1.svg

Sensor#

We want to learn doing the same task using a sensor

[9]:

%%writefile sensor.yaml
type: Combination
sensors:
  - type: Boundary
    min_y: 0
    max_y: 1
    range: 1
  - type: Discs
    number: 1
    range: 1
    max_speed: 0.0
    max_radius: 0.0
    include_valid: false

Overwriting sensor.yaml

that perceives

the nearest point of the obstacle
the distance to the left and right walls

The agent input will be

the sensing reading (see above)
the own speed in body frame
the own angular speed
the own targer direction in body frame

and should output an acceleration command (linear and angular).

We measure the quality of the trajectoty by a reward that penalizes

safety margin violations (violating by at most 1 cm is excused)
not moving at full speed along the corridor

A trajectory cannot have more than 0 reward, which can happen only if the obstacle is not in the way.

[10]:

import gymnasium as gym
from navground import sim
from navground.learning import ControlActionConfig, DefaultObservationConfig, io
from navground.learning.rewards import SocialReward
import navground.learning.scenarios

duration = 40.0
time_step = 0.1

action_config = ControlActionConfig(max_acceleration=1.0, max_angular_acceleration=10.0,
                                    use_acceleration_action=True)

observation_config = DefaultObservationConfig(include_target_direction=True, include_velocity=True,
                                              include_angular_speed=True, flat=False)

reward = SocialReward(safety_margin=0.04)

with open('sensor.yaml') as f:
    sensor = sim.load_state_estimation(f.read())

with open('scenario.yaml') as f:
    scenario = sim.load_scenario(f.read())

env = gym.make('navground',
    scenario=scenario,
    sensor=sensor,
    action=action_config,
    observation=observation_config,
    time_step=time_step,
    max_duration=duration,
    reward=reward)

io.save_env(env, 'env.yaml')

Let us look at the rewards using HL

[11]:

from navground.learning.evaluation import evaluate_with_experiment_and_env

rewards, _ = evaluate_with_experiment_and_env(env, n_eval_episodes=1000,
                                              return_episode_rewards=True)

print(f"Rewards: min={np.min(rewards):.2f}, mean={np.mean(rewards):.2f}, "
      f"median={np.median(rewards):.2f}, max={np.max(rewards):.2f}")

Rewards: min=-71.02, mean=-5.36, median=-3.95, max=0.00

and the run with lowest rewards

[12]:

seed = int(np.argmin(rewards))

print(f"Minimal rewards {rewards[seed]:.2f} for seed={seed}")

Minimal rewards -71.02 for seed=158

[13]:

world = sim.World()
scenario.init_world(world, seed=seed)
display_video(world=world, time_step=0.1, duration=20.0, factor=5.0, display_width=360)

[13]:

which is one of the cases we discussed at the beginning, where the agent is passing on the “wrong” side of the obstacle.

In about 20% of the runs, the agent just goes straight

[14]:

p_straight = sum(np.asarray(rewards) == 0) / len(rewards)
print(f"p_straight={p_straight:.1%}")

p_straight=22.2%

and in all the other runs it has to actually avoid the obstacles, incurring in a penality

[15]:

plt.figure(figsize=(5, 2))
plt.hist(rewards, bins=20, density=True, color='firebrick');
plt.xlabel('reward')
plt.ylabel('probability');

../../_images/tutorials_corridor_with_obstacle_Scenario_24_0.svg

Finally, we have a look at the input/output distributions

[16]:

from navground.learning import GroupConfig
from navground.learning.probes import GymProbe

[17]:

exp = sim.Experiment(time_step=0.1, steps=300)
exp.scenario = scenario
exp.record_config.pose = True
exp.add_probe(GymProbe.with_env(env))
exp.run(number_of_runs=1000)

[18]:

exp.runs[0].records

[18]:

{'actions/0': <Dataset: shape (169, 2), dtype float32>,
 'observations/0/boundary_distance': <Dataset: shape (170, 2), dtype float32>,
 'observations/0/ego_angular_speed': <Dataset: shape (170, 1), dtype float32>,
 'observations/0/ego_target_direction': <Dataset: shape (170, 2), dtype float32>,
 'observations/0/ego_velocity': <Dataset: shape (170, 1), dtype float32>,
 'observations/0/position': <Dataset: shape (170, 1, 2), dtype float32>,
 'poses': <Dataset: shape (169, 1, 3), dtype float32>,
 'rewards/0': <Dataset: shape (169,), dtype float64>}

[19]:

actions = np.concatenate([np.asarray(run.get_record('actions/0')) for run in exp.runs.values()])
ego_target_direction = np.concatenate([np.asarray(run.get_record('observations/0/ego_target_direction')) for run in exp.runs.values()])
position = np.concatenate([np.asarray(run.get_record('observations/0/position')) for run in exp.runs.values()])
boundary_distance = np.concatenate([np.asarray(run.get_record('observations/0/boundary_distance')) for run in exp.runs.values()])
ego_velocity = np.concatenate([np.asarray(run.get_record('observations/0/ego_velocity')) for run in exp.runs.values()])
ego_angular_speed = np.concatenate([np.asarray(run.get_record('observations/0/ego_angular_speed')) for run in exp.runs.values()])

[20]:

plt.figure(figsize=(6, 4))
plt.hexbin(*actions.T, bins='log');
plt.xlim(-0.5, 0.5)
plt.ylim(-0.5, 0.5)
plt.xlabel('acc x [relative to max]')
plt.ylabel('acc y [relative to max]')
plt.colorbar()
plt.title('actions distribution');

../../_images/tutorials_corridor_with_obstacle_Scenario_30_0.svg

[21]:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3))
i1 = ax1.hexbin(*ego_target_direction.T, bins='log');
ax1.axis('equal')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_title('relative target direction distribution');

i2 = ax2.hexbin(*position.T, bins='log');
ax2.axis('equal')
ax2.set_xlabel('x [m]')
ax2.set_ylabel('y [m]')
ax2.set_title('relative obstacle position distribution');

fig.colorbar(i1, ax=ax1)
fig.colorbar(i2, ax=ax2);

../../_images/tutorials_corridor_with_obstacle_Scenario_31_0.svg

[22]:

bottom, top = boundary_distance.T
fig, axs = plt.subplots(1, 3, figsize=(12, 2))
fig.suptitle('boundary distance distribution');

i = axs[0].hexbin(*boundary_distance.T, bins='log');
axs[0].set_xlabel('from bottom [m]')
axs[0].set_ylabel('from top [m]')
axs[0].axis('equal')
fig.colorbar(i, ax=axs[0])

axs[1].hist(bottom, bins=30, density=True, label="bottom", alpha=0.5);
axs[2].hist(top, bins=30, density=True, label="top", alpha=0.5, color='g');
axs[1].set_xlabel('bottom distance [m]')
axs[2].set_xlabel('top distance [m]')
axs[1].set_ylabel('probability');

../../_images/tutorials_corridor_with_obstacle_Scenario_32_0.svg

[23]:

_, (ax1, ax2) = plt.subplots(figsize=(10, 2), ncols=2)
ax1.hist(ego_velocity.flatten(), bins=30, density=True);
ax1.set_xlabel('speed [m/s]')
ax1.set_ylabel('probability')
ax1.set_title('ego speed distribution');
ax2.hist(ego_angular_speed.flatten(), bins=30, density=True);
ax2.set_xlabel('angular speed [rad/s]')
ax2.set_ylabel('probability')
ax2.set_title('ego angular speed distribution');

../../_images/tutorials_corridor_with_obstacle_Scenario_33_0.svg

[ ]:

Scenario

Contents

Scenario#

Sensor#