Quantitative Evaluation of Autonomous Driving in CARLA

by Shang Gao, Spencer R Paulissen, Mark A Coletti, Robert M Patton

Publication Type

Conference Paper

Book Title

From Benchmarking Behavior Prediction to Socially Compatible Behavior Generation in Autonomous Driving

Publication Date

July, 2021

Conference Name

Machine Learning for Autonomous Driving at NeurIPS 2020

Conference Location

Virtual, Tennessee, United States of America

Conference Sponsor

https://nips.cc/Conferences/2020/Sponsors

Conference Date

Dec 11, 2020

Abstract

There has been a great deal of recent advancements in end-to-end imitation and reinforcement learning for self-driving vehicles. Despite this, there is a severe lack of standardized metrics for evaluating the performance of autonomous self-driving agents. Existing metrics are generally lacking in their ability to capture a wide range of driving behaviors and compare the severity of different failure cases. In this work, we introduce the Quantitative Evaluation for Driving metric, or QED, which assigns a quantitative score from 0-100 that captures the quality of driving for any driving agent. Our QED metric assesses different aspects of driving behavior including the ability to stay in the center of the lane, avoid weaving and erratic behavior, follow the speed limit, and avoid collisions, and it can be used under a wide range of driving scenarios. To show the effectiveness of our QED metric, we compare the scores generated by QED against scores assigned by human evaluators on a total of 30 different drivers and 6 different towns in the CARLA driving simulator. In ``easy'' evaluation scenarios, where it is relatively straightforward to distinguish better drivers from worse drivers, QED attains an average Pearson correlation of 0.96 and average Spearman correlation of 0.97 when compared against human evaluators. In ``hard'' evaluation scenarios, where it is far more ambiguous how to rank/score different types of bad driving behavior, QED attains an average Pearson correlation of 0.82 and average Spearman correlation of 0.75 when compared against human evaluators, which are both slighter higher than when we compare human evaluators against each other. While QED may not capture every characteristic that defines good driving, we consider it an important foundation for reproducibility and standardization in the community.

Quantitative Evaluation of Autonomous Driving in CARLA

Abstract

Researchers

Organizations