In this session we would like to present our AMIDST toolbox for analysis of large-scale data sets using probabilistic machine learning models. AMIDST runs algorithms in a distributed fashion for learning and
inference in a wide spectrum of latent variable models such as Gaussian mixtures, (probabilistic) principal component analysis, Hidden Markov Models, Kalman Filters, Latent Dirichlet Allocation, etc. This toolbox is
able to perform Bayesian parameter learning  on any user-defined probabilistic (graphical) model with billions of nodes using novel distributed message passing algorithms.

We plan to give an overview of the AMIDST toolbox (Java open source), some details about the API and the integration with Flink, and an analysis of the scalability of our learning algorithms. All this in the context of a real use case scenario in the financial domain (BCC group), where the profile of millions of customers is analyzed using Flink and the Amazon Web Services.

Slides: Ana M Martinez – AMIDST Toolbox- Scalable probabilistic machine learning with Flink.pdf

Video on YouTube