Skip to main content
SHARE
Publication

Quantitative Evaluation of Autonomous Driving in CARLA...

by Shang Gao, Spencer R Paulissen, Mark A Coletti, Robert M Patton
Publication Type
Conference Paper
Book Title
From Benchmarking Behavior Prediction to Socially Compatible Behavior Generation in Autonomous Driving
Publication Date
Conference Name
Machine Learning for Autonomous Driving at NeurIPS 2020
Conference Location
Virtual, Tennessee, United States of America
Conference Sponsor
https://nips.cc/Conferences/2020/Sponsors
Conference Date

There has been a great deal of recent advancements in end-to-end imitation and reinforcement learning for self-driving vehicles. Despite this, there is a severe lack of standardized metrics for evaluating the performance of autonomous self-driving agents. Existing metrics are generally lacking in their ability to capture a wide range of driving behaviors and compare the severity of different failure cases. In this work, we introduce the Quantitative Evaluation for Driving metric, or QED, which assigns a quantitative score from 0-100 that captures the quality of driving for any driving agent. Our QED metric assesses different aspects of driving behavior including the ability to stay in the center of the lane, avoid weaving and erratic behavior, follow the speed limit, and avoid collisions, and it can be used under a wide range of driving scenarios. To show the effectiveness of our QED metric, we compare the scores generated by QED against scores assigned by human evaluators on a total of 30 different drivers and 6 different towns in the CARLA driving simulator. In ``easy'' evaluation scenarios, where it is relatively straightforward to distinguish better drivers from worse drivers, QED attains an average Pearson correlation of 0.96 and average Spearman correlation of 0.97 when compared against human evaluators. In ``hard'' evaluation scenarios, where it is far more ambiguous how to rank/score different types of bad driving behavior, QED attains an average Pearson correlation of 0.82 and average Spearman correlation of 0.75 when compared against human evaluators, which are both slighter higher than when we compare human evaluators against each other. While QED may not capture every characteristic that defines good driving, we consider it an important foundation for reproducibility and standardization in the community.