Skip to main content
SHARE
Publication

Benchmarking for AI for Science...

by Jeyan Thiyagalingam, Mallikarjun Shankar, Geoffrey Fox, Tony Hey
Publication Type
Book Chapter
Publication Date
Page Numbers
163 to 178
Publisher Name
World Scientific
Publisher Location
Singapore, Singapore

AI has been instrumental for recent developments in a number of domains of the sciences. With several hundred machine learning (ML) algorithms and models, and numerous AI-specific hardware platforms, a common quest for all scientists working on AI for Science is around the selection of machine learning algorithm(s) to solve their domain-specific scientific problems. A number of different initiatives around AI Benchmarking have been set up and have been useful in understanding the benefits of different ML algorithms for different tasks.
However, with the majority of these AI Benchmarking initiatives focusing on the conventional notions of benchmarking, where the focus is purely runtime performance (such as training time or inference time), their suitability for benchmarking different ML algorithms for solving scientific problems has been viewed as a performance problem even though both are hardly the same. To make reasonable, explainable, and justifiable advancements in science using AI, it is critical to focus on the merits of these algorithms in handling different domain science problems. In other words, more emphasis must be given on Benchmarking for AI for Science than AI Benchmarking. The vision of the former is not only to assess the performance of ML algorithms, but also to assess, and understand the benefits and merits of different ML algorithms in handling scientific problems. Benchmarking for AI for Science, instead of pure performance focused AI Benchmarking, has several benefits: (i) it has the potential to offer advances in the sciences, much more rapidly than through pure performance-based AI methods, (ii) it will encourage the community to focus on developing better domain-specific AI techniques, particularly given the provision for being able to benchmark different techniques, and (iii) it will encourage hardware manufacturers to focus on developing science-specific hardware subsystems.