Skip to main content
SHARE
Publication

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources...

by Edmon Begoli, Jesu´s Camacho-rodri´guez, Julian Hyde, Michael Mior, Daniel Lemire
Publication Type
Conference Paper
Journal Name
ACM International Conference on Management of Data
Publication Date
Page Numbers
221 to 230
Volume
0
Issue
0
Conference Name
2018 ACM SIGMOD/PODS Conference
Conference Location
Houston, Texas, United States of America
Conference Sponsor
ACM SIGMOD
Conference Date
-

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. The goal of this paper is to formally introduce Calcite to the broader research community, brie y present its history, and describe its architecture, features, functionality, and patterns for adoption. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This exible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.