Parallel SAC#
We train a distributed policy, shared between the two agents, using SAC from StableBaseline3, in “parallel” mode, i.e., we treat the multi-agent environment as a vectorized single agent environment performing rollouts in parallel, like we did for the Crossing tutorial.
As for the previous notebooks, the policy computes (linear) accelerations. We test different observations spaces, action spaces and training algorithms.