Abstract
In recent years materials informatics, which is the application of data science to problems in materials science and engineering, has emerged as a powerful tool for materials discovery and design. This relatively new field is already having a significant impact on the interpretation of data for a variety of materials systems, including those used in thermoelectrics, ferroelectrics, battery anodes and cathodes, hydrogen storage materials, polymer dielectrics, etc. Its practitioners employ the methods of multivariate statistics and machine learning in conjunction with standard computational tools (e.g., density-functional theory) to, for example, visualize and dimensionally reduce large data sets, identify patterns in hyperspectral data, parse microstructural images of polycrystals, characterize vortex structures in ferroelectrics, design batteries and, in general, establish correlations to extract important physics and infer structure-property-processing relationships. In this Overview, we critically examine the role of informatics in several important materials subfields, highlighting significant contributions to date and identifying known shortcomings. We specifically focus attention on the difference between the correlative approach of classical data science and the causative approach of physical sciences. From this perspective, we also outline some potential opportunities and challenges for informatics in the materials realm in this era of big data.