In this talk I present the results of a set of experiments comparing the performance of several implementations of aggregating time-series data. There are 3 implementations: a baseline implementation not using any streaming frameworks, an implementation using Apache Flink, and an implementation using Apache Spark Streaming*. These implementations all ran against the same Kafka cluster using the same data stream, with the goal to understand the limitations of the different implementations. The limitations were measured at 3 input data rates: 100%, 6000%, and breaking-point load.

Slides: Ron Crocker – Evaluating Streaming Framework Performance for a Large-Scale Aggregation Pipeline

Video on YouTube


Ron Crocker
Principal Engineer / Architect, New Relic