Single-Sample Planning
Single-sample planning often results in suboptimal performance due to a lack of diversity in the plans.
Autonomous driving necessitates the ability to reason about future interactions between traffic agents and to make informed evaluations for planning. This paper introduces the Gen-Drive framework, which shifts from the traditional prediction and deterministic planning framework to a generation-then-evaluation planning paradigm.
The framework employs a behavior diffusion model as a scene generator to produce diverse possible future scenarios, thereby enhancing the capability for joint interaction reasoning. To facilitate decision-making, we propose a scene evaluator (reward) model, trained with pairwise preference data collected through VLM assistance, thereby reducing human workload and enhancing scalability. Furthermore, we utilize an RL fine-tuning framework to improve the generation quality of the diffusion model, rendering it more effective for planning tasks.
We conduct training and closed-loop planning tests on the nuPlan dataset, and the results demonstrate that employing such a generation-then-evaluation strategy outperforms other learning-based approaches. Additionally, the fine-tuned generative driving policy shows significant enhancements in planning performance. We further demonstrate that utilizing our learned reward model for evaluation or RL fine-tuning leads to better planning performance compared to relying on human-designed rewards.
The query-centric encoding Transformer encodes all scene elements in local coordinates while preserving relative information in attention calculations. The diffusion denoising Transformer comprises multiple attention blocks that iteratively attend to noised object futures, future-scene, and ego-route interactions.
During diffusion generation, different scenes can be produced in parallel and then be fed into the scene evaluation Transformer. The evaluation model utilizes a Transformer encoder-decoder to fuse information from the future scene and map, and two MLP heads are used to reconstruct the ego plan and output a score for the scene/plan.
We conduct pairwise sampling of the generated scenarios from the base diffusion generator. We first compute discrepancies between the planned trajectories, and then check collisions and off-road to filter out obvious failure cases.
If these measures are insufficient for distinction, we utilize GPT-4o to provide a conclusive evaluation. GPT-4o can provide reasonable evaluations of the two generated scenarios based on the current scene context.
Single-sample planning often results in suboptimal performance due to a lack of diversity in the plans.
Multiple-sample planning increases the diversity and the learned reward model helps make the right decision.
The base diffusion driving policy can exhibit undesirable behaviors in certain scenarios due to the limitations inherent in imitation learning.
Fine-tuning the diffusion policy leads to substantial improvements in closed-loop planning in some challenging situations.
Rule-based planners often exhibit unnatural behaviors and are unable to handle more complex driving tasks in real-world scenarios.
Our Gen-Drive planner shows nuanced driving behaviors across various situations, exhibiting human-like planning and adaptability.
@article{huang2024gendrive,
title={Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning},
author={Huang, Zhiyu and Weng, Xinshuo and Igl, Maximilian and Chen, Yuxiao and Cao, Yulong and Ivanovic, Boris and Pavone, Marco and Lv, Chen},
journal={arXiv preprint arXiv:2410.05582},
year={2024}
}