Marco Gorelli's talks, articles, workshops, certificates

Never Have an Unmaintainable Jupyter Notebook Again!

ML conf EU 2020

26 min

Never Have an Unmaintainable Jupyter Notebook Again!

Data visualisation is a fundamental part of Data Science. The talk will start with a practical demonstration (using pandas, scikit-learn, and matplotlib) of how relying on summary statistics and predictions alone can leave you blind to the true nature of your datasets. I will make the point that visualisations are crucial in every step of the Data Science process and therefore that Jupyter Notebooks definitely do belong in Data Science. We will then look at how maintainability is a real challenge for Jupyter Notebooks, especially when trying to keep them under version control with git. Although there exists a plethora of code quality tools for Python scripts (flake8, black, mypy, etc.), most of them don't work on Jupyter Notebooks. To this end I will present nbQA, which allows any standard Python code quality tool to be run on a Jupyter Notebook. Finally, I will demonstrate how to use it within a workflow which lets practitioners keep the interactivity of their Jupyter Notebooks without having to sacrifice their maintainability.

dataviz machine learning