A large portion of transactions on Alibaba’s e-commerce Taobao platform is initiated through its Alibaba Search engine. Real time data streaming processing is one of the cornerstones in Alibaba’s search infrastructure. Among all the streaming solutions, Flink is the closest to meet our requirements. However, we don’t think Flink is quite up to our scale and reliability challenges. For example, its current support for Yarn can result in inefficiency in resource allocation. Job isolation and debugging can also be challenging. In this paper, we present the design and implementation of Blink, an improved runtime engine for Flink, better integrated with Yarn. It addresses above and various other problems we encountered in production. Since the changes are at the runtime layer, Blink is fully compatible with the Flink API and its machine learning libraries. We will also share the experience in our production use in a Hadoop cluster of more than one thousand servers in Alibaba Search.

Video on YouTube