Apache Flink features two APIs which are based on relational algebra, a SQL interface and the so-called Table API, which is a LINQ-style API available for Scala and Java. Relational APIs are interesting because they are easy to use and queries can be automatically optimized and translated into efficient runtime code. Flink offers both APIs for streaming and batch data sources. This talk will take a look under the hood of Flink’s relational APIs. We will show the unified architecture to handle streaming and batch queries and explain how Flink translates queries of both APIs into the same representation, leverages Apache Calcite to optimize them, and generates runtime code for efficient execution. Finally, we will discuss potential improvements and give an outlook for future extensions and features.

Slides: Fabian Hueske – Taking a look under the hood of Apache Flinkā€™s relational APIs

Video on YouTube


Fabian Hueske
Software Engineer, data Artisans