Skip to the content.

GameFormer

Zhiyu Huang, Haochen Liu, Chen Lv

AutoMan Research Lab, Nanyang Technological University

Abstract

Autonomous vehicles operating in complex real-world environments require accurate predictions of interactive behaviors between traffic participants. This paper tackles the interaction prediction problem by formulating it with hierarchical game theory and proposing the GameFormer model for its implementation. The model incorporates a Transformer encoder, which effectively models the relationships between scene elements, alongside a novel hierarchical Transformer decoder structure. At each decoding level, the decoder utilizes the prediction outcomes from the previous level, in addition to the shared environmental context, to iteratively refine the interaction process. Moreover, we propose a learning process that regulates an agent’s behavior at the current level to respond to other agents’ behaviors from the preceding level. Through comprehensive experiments on large-scale real-world driving datasets, we demonstrate the state-of-the-art accuracy of our model on the Waymo interaction prediction task. Additionally, we validate the model’s capacity to jointly reason about the motion plan of the ego agent and the behaviors of multiple agents in both open-loop and closed-loop planning tests, outperforming various baseline methods. Furthermore, we evaluate the efficacy of our model on the nuPlan planning benchmark, where it achieves leading performance.

Method Overview

The proposed framework draws inspiration from hierarchical game-theoretic modeling of agent interactions. The framework encodes the historical states of agents and maps as background information via a Transformer-based encoder. A level-0 agent’s future trajectories are decoded independently, based on the initial modality query. At level-k, an agent responds to all other agents at level-(k-1). The level-0 decoder uses modality embedding and agent history encodings as query inputs to independently decode the future trajectories and scores for level-0 agents. The level-k decoder incorporates a self-attention module to model the future interactions at level-(k-1) and appends this information to the scene context encoding.

Interaction Prediction (Waymo)

Given the tracks of agents for the past 1 second on a corresponding map, the objective is to predict the joint future positions of 2 interacting agents for 8 seconds into the future.

Closed-loop Planning (Waymo)

The planner outputs a planned trajectory at each time step, which is used to simulate the vehicle’s state at the next time step. The other agents are replayed from a log according to their observed states in the dataset.

Closed-loop Planning (nuPlan)

Please refer to the GameFormer Planner report for the details of the planning framework. The following scenarios demonstrate the performance of closed-loop planning with non-reactive agents in selected interactive situations from the nuPlan dataset.

Citation

@article{huang2023gameformer,
  title={GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving},
  author={Huang, Zhiyu and Liu, Haochen and Lv, Chen},
  journal={arXiv preprint arXiv:2303.05760},
  year={2023}
}

Contact

If you have any questions, feel free to contact us (zhiyu001@e.ntu.edu.sg).