Safety-Aware Imitation Learning via MPC-Guided Disturbance Injection

1 Tsinghua University, 2 University of Southern California, 3 Stanford University
Teaser Image

Abstract

Imitation Learning has provided a promising approach to learning complex robot behaviors from expert demonstrations. However, learned policies can make errors that lead to safety violations, which limits their deployment in safety-critical applications. We propose MPC-SafeGIL, a design-time approach that enhances the safety of imitation learning by injecting adversarial disturbances during expert demonstrations. This exposes the expert to a broader range of safety-critical scenarios and allows the imitation policy to learn robust recovery behaviors. Our method uses sampling-based Model Predictive Control (MPC) to approximate worst-case disturbances, making it scalable to high-dimensional and black-box dynamical systems. In contrast to prior work that relies on analytical models or interactive experts, MPC-SafeGIL integrates safety considerations directly into data collection. We validate our approach through extensive simulations -- including quadruped locomotion and visuomotor navigation -- and real-world experiments on a quadrotor, demonstrating improvements in both safety and task performance.

Approach

Quadruped Navigation

A quadruped needs to reach the goal position while avoiding obstacles. The policy takes lidar and proprioception as input and predicts high-level linear and yaw velocity commands.

Demonstrations

Demonstrations are collected in different obstacle environments. Compared to BC, MPC-SafeGIL consistently guides the expert closer to obstacles, enabling the policy to learn more recovery behaviors under these safety-critical states.

Demonstrations

Safety

When testing the learned policy in an unseen obstacle map, MPC-SafeGIL achieves significantly lower collision rate and higher success rate.

BC

MPC-SafeGIL

Performance

Learned from 60 demonstrations

GAIL

MPC-SafeGIL

Learned from 120 demonstrations

GAIL

MPC-SafeGIL

F1Tenth

A visuomotor F1Tenth racing car is tasked with navigating on the track while avoiding the curbs. The policy takes LiDAR as input and outputs the desired steering angle and velocity.

Comparison

MPC-SafeGIL can achieve much longer distance traveled and lower collision rate.

Safety Filter

To further evaluate compatibility with online safety mechanisms, we incorporate the imitation policies with predictive safety filter during test time. This predictive safety filter overrides the nominal control with an MPC controller whenever a future 50 step trajectory predicts a collision. Both BC and MPC-SafeGIL can benefit from safety filter. However, MPC-SafeGIL consistently achieve better performance.

Generalization

To evaluate generalization, we test the learned policies on a novel, unseen track. MPC-SafeGIL can achieve the highest distance traveled.

Hardware Experiment

A Crazyflie quadrotor needs to reach a goal location without collisions in real-world. The training demonstrations are collected in simulation with randomly generated obstacle environments. The learned policy are then deployed across diverse real-world obstacle settings.

A structured obstacle environment

BC

MPC-SafeGIL

A densely cluttered environment

BC

MPC-SafeGIL

A dynamic obstacle environment

BC

MPC-SafeGIL

BibTeX

BibTex Code Here