Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning

1Nanyang Technological University 2NVIDIA Research 3Stanford University
Gen-Drive Introduction

Gen-Drive represents a paradigm shift from conventional prediction-deterministic-planning approaches to a generation-then-evaluation framework.

Abstract

Autonomous driving necessitates the ability to reason about future interactions between traffic agents and to make informed evaluations for planning. This paper introduces the Gen-Drive framework, which shifts from the traditional prediction and deterministic planning framework to a generation-then-evaluation planning paradigm.

The framework employs a behavior diffusion model as a scene generator to produce diverse possible future scenarios, thereby enhancing the capability for joint interaction reasoning. To facilitate decision-making, we propose a scene evaluator (reward) model, trained with pairwise preference data collected through VLM assistance, thereby reducing human workload and enhancing scalability. Furthermore, we utilize an RL fine-tuning framework to improve the generation quality of the diffusion model, rendering it more effective for planning tasks.

We conduct training and closed-loop planning tests on the nuPlan dataset, and the results demonstrate that employing such a generation-then-evaluation strategy outperforms other learning-based approaches. Additionally, the fine-tuned generative driving policy shows significant enhancements in planning performance. We further demonstrate that utilizing our learned reward model for evaluation or RL fine-tuning leads to better planning performance compared to relying on human-designed rewards.

Model Structure

Neural Network Structure of the Gen-Drive Model

The query-centric encoding Transformer encodes all scene elements in local coordinates while preserving relative information in attention calculations. The diffusion denoising Transformer comprises multiple attention blocks that iteratively attend to noised object futures, future-scene, and ego-route interactions.

During diffusion generation, different scenes can be produced in parallel and then be fed into the scene evaluation Transformer. The evaluation model utilizes a Transformer encoder-decoder to fuse information from the future scene and map, and two MLP heads are used to reconstruct the ego plan and output a score for the scene/plan.

Pairwise Preference from AI Feedback

Prompt to GPT-4o and its feedback

We conduct pairwise sampling of the generated scenarios from the base diffusion generator. We first compute discrepancies between the planned trajectories, and then check collisions and off-road to filter out obvious failure cases.

If these measures are insufficient for distinction, we utilize GPT-4o to provide a conclusive evaluation. GPT-4o can provide reasonable evaluations of the two generated scenarios based on the current scene context.

Closed-loop Planning Results

Single-Sample Planning

Single-sample planning often results in suboptimal performance due to a lack of diversity in the plans.

Multi-Sample and Scoring

Multiple-sample planning increases the diversity and the learned reward model helps make the right decision.

Single-Sample Planning

Multi-Sample and Scoring

Before Fine-tuning

The base diffusion driving policy can exhibit undesirable behaviors in certain scenarios due to the limitations inherent in imitation learning.

After Fine-tuning

Fine-tuning the diffusion policy leads to substantial improvements in closed-loop planning in some challenging situations.

Before Fine-tuning

After Fine-tuning

Before Fine-tuning

After Fine-tuning

PDM-Closed Planner

Rule-based planners often exhibit unnatural behaviors and are unable to handle more complex driving tasks in real-world scenarios.

Gen-Drive Planner

Our Gen-Drive planner shows nuanced driving behaviors across various situations, exhibiting human-like planning and adaptability.

PDM-Closed Planner

Gen-Drive Planner

PDM-Closed Planner

Gen-Drive Planner

BibTeX

@article{huang2024gendrive,
  title={Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning},
  author={Huang, Zhiyu and Weng, Xinshuo and Igl, Maximilian and Chen, Yuxiao and Cao, Yulong and Ivanovic, Boris and Pavone, Marco and Lv, Chen},
  journal={arXiv preprint arXiv:2410.05582},
  year={2024}
}