The operation and control of power grids will increasingly rely on data. A high-speed, reliable, flexible and secure data architecture is the prerequisite of the next-generation power grid. This paper summarizes the challenges in collecting and utilizing power grid data, and then provides reference data architecture for future power grids. Based on the data architecture deployment, related research on data architecture is reviewed and summarized in several categories including data measurement/actuation, data transmission, data service layer, data utilization, as well as two cross-cutting issues, interoperability and cyber security. Research gaps and future work are also presented.