Frans van Dunné
Frans van Dunné
Frans is Chief Data Officer at ixpantia. He combines a diverse skill-set, that includes business analysis, data analysis and enterprise architecture, with more than 15 years of experience to help organizations respond to their data driven innovation needs quickly and effectively. Frans has a PhD in biology from the University of Amsterdam and has taught at universities in Europe and Latin America. As a consultant he has facilitated in-company training on diverse topics, including data driven innovation, applied statistics, statistical programming and machine learning.
ML conf EU 2020ML conf EU 2020
8 min
Processing Robot Data at Scale with R and Kubernetes
Most people would agree that R is a popular language for data analysis. Perhaps less well known is that R has good support for parallel execution on a single CPU through packages like future. In this presentation we will talk about our experience scaling up R processes even further running R in parallel in docker containers using Kubernetes. Robots generate massive amounts of sensor and other data; extracting the right information and insights from this requires significant more processing than can be tackled on a single execution environment. Faced with a preprocessing job of several hundred GB of data of compressed json line files, we used Pachyderm to write data pipelines to run the data prep in parallel, using multicore containers on a kubernetes cluster.
By the end of the talk we will have dispelled the myth that R cannot be used in production at scale. Even if you do not use R, you will have seen a use case to scale up analysis regardless of your language of choice.