The Apache Beam (incubating) programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in Google Cloud Dataflow and which path other Apache Beam runners link Apache Flink can follow to benefit from it.

Slides: Malo Denielou – No shard left behind: Dynamic work rebalancing in Apache Beam.pdf

Video on YouTube


Malo Denielou
Software Engineer, Google