Imitation Learning has provided a promising approach to learning complex robot behaviors from expert demonstrations. However, learned policies can make errors that lead to safety violations, which limits their deployment in safety-critical applications. We propose MPC-SafeGIL, a design-time approach that enhances the safety of imitation learning by injecting adversarial disturbances during expert demonstrations. This exposes the expert to a broader range of safety-critical scenarios and allows the imitation policy to learn robust recovery behaviors. Our method uses sampling-based Model Predictive Control (MPC) to approximate worst-case disturbances, making it scalable to high-dimensional and black-box dynamical systems. In contrast to prior work that relies on analytical models or interactive experts, MPC-SafeGIL integrates safety considerations directly into data collection. We validate our approach through extensive simulations -- including quadruped locomotion and visuomotor navigation -- and real-world experiments on a quadrotor, demonstrating improvements in both safety and task performance.
A quadruped needs to reach the goal position while avoiding obstacles. The policy takes lidar and proprioception as input and predicts high-level linear and yaw velocity commands.
Demonstrations are collected in different obstacle environments. Compared to BC, MPC-SafeGIL consistently guides the expert closer to obstacles, enabling the policy to learn more recovery behaviors under these safety-critical states.
When testing the learned policy in an unseen obstacle map, MPC-SafeGIL achieves significantly lower collision rate and higher success rate.
BC
MPC-SafeGIL
GAIL
MPC-SafeGIL
GAIL
MPC-SafeGIL
A visuomotor F1Tenth racing car is tasked with navigating on the track while avoiding the curbs. The policy takes LiDAR as input and outputs the desired steering angle and velocity.
MPC-SafeGIL can achieve much longer distance traveled and lower collision rate.
To further evaluate compatibility with online safety mechanisms, we incorporate the imitation policies with predictive safety filter during test time. This predictive safety filter overrides the nominal control with an MPC controller whenever a future 50 step trajectory predicts a collision. Both BC and MPC-SafeGIL can benefit from safety filter. However, MPC-SafeGIL consistently achieve better performance.
To evaluate generalization, we test the learned policies on a novel, unseen track. MPC-SafeGIL can achieve the highest distance traveled.
A Crazyflie quadrotor needs to reach a goal location without collisions in real-world. The training demonstrations are collected in simulation with randomly generated obstacle environments. The learned policy are then deployed across diverse real-world obstacle settings.
BC
MPC-SafeGIL
BC
MPC-SafeGIL
BC
MPC-SafeGIL
BibTex Code Here