Andreas Müller
Andreas Müller
Andreas Müller is a Principal Research SDE at Microsoft, where he works on the interface of the Data Science ecosystem and cloud infrastructure. He previously held positions as Associate Research Scientist at the Columbia Data Science Institute and as a Research Engineer at the NYU Center for Data Science. He is one of the core developers of the scikit-learn machine learning library, a member of the scikit-learn technical committee, and the author of the book "Introduction to machine learning with Python". His work focuses on practical aspects of machine learning and the development of user-centric machine learning software.
ML conf EU 2020ML conf EU 2020
35 min
Dabl: Automatic Machine Learning with a Human in the Loop
In many real-world applications, data quality and curation and domain knowledge play a much larger role in building successful models than coming up with complex processing techniques and tweaking hyper-parameters. Therefore, a machine learning toolbox should enable users to understand both data and model, and not burden the practitioner with picking preprocessing steps and hyperparameters. The dabl library is a first step in this direction. It provides automatic visualization routines and model inspection capabilities while automating away model selection.
dabl contains plot types not available in standard python libraries so far, as well as novel algorithms for picking interesting visualizations. Heuristics are used to select appropriate preprocessing for machine learning, while state-of-the-art portfolio selection algorithms are used for efficient model and hyperparameter search.
dabl also provides easy access to model evaluation and model inspection tools provided scikit-learn.