MLOps: Building Machine Learning Systems

Author: Ani Madurkar

The importance of thinking larger when designing effective and ethical machine learning systems

MLOps is taking the data science and machine learning landscape by storm as organizations struggle to realize the value promised by their data. This is partly due to the difficulty in productionizing machine learning models at scale and also in part, due to the difficulty in navigating the MLOps software landscape.

I wrote a 4-part series that discusses the key challenges in today’s machine learning practices and the concepts needed to build machine learning systems at scale. This story will summarize and highlight each story’s purpose in helping you to level up your machine learning projects.

In the introductory story, I discuss how the majority of machine learning projects are currently at level 0 and the challenges of being at that stage. The key challenge is that minimal value is being generated for the business and the system needs to evolve to level 1 to start realizing value in a consistent and repeatable fashion.

Going from MLOps Level 0 to Level 1

There are numerous data scientists who don’t stray too far from the comfort of their Jupyter notebook while performing extensive modeling proof-of-concept work. This is fine except there can be a large amount of work that is needed to take the best of those models and make them work on real-world data. The other 50+% of the project still needs to be done.

This story discusses what level 0 is, what challenges are commonly experienced at that stage, and what is needed to evolve to level 1 from a conceptual perspective.

Machine Learning Systems Pt. 1: Overview and Challenges

At the heart of practical data science, related to machine learning, is a curious mind focused on building. In that effort, I show how you can start using Tensorflow Extended to start thinking in larger terms that involve reproducibility, repeatability, and replicability.

This story introduces code examples of how to take the first step after doing the modeling proof-of-concept work on a relatively static dataset. This step involves curating a schema that has all the specs of the data that is needed to keep the modeling pipeline afloat, built on top of data specifications/thresholds and business logic.

Machine Learning Systems Pt. 2: Data Pipelines with TensorFlow Extended

The first step involves creating a schema that is curated to your contextual needs, and this story takes it a step further by building out a data pipeline. This story takes into account that the second part of a machine learning project at scale involves holding the model relatively constant and changing the data.

In this part, I show the importance of creating a pipeline when you build out your machine learning system because, ultimately, you’re not just deploying a model — you’re deploying a data system. This is done with data and schema validation, feature engineering, checks, and triggers, etc.

Machine Learning Systems Pt. 3: Modeling Pipelines with TensorFlow Extended

Introducing a model into your pipeline that you’re looking to deploy also dictates the need for extensive version controlling, experiment management, model registry, feature store, and more. These features round out the MLOps infrastructure and allow for data science to be done as intended: with reproducibility, repeatability, and replicability.

With the modeling pipeline, it’s usually not enough to stop your analysis with the model itself. You want to do robust checks of how the model performs on a variety of slices of real-world data to identify gaps and antipatterns regardless of performance metrics. Highlighting patterns of failure through persistent evaluation give significant hints into how your model may be failing in certain realms.

This concludes a walkthrough of MLOps and how to take your first step in creating effective machine learning systems. For further help developing and implementing your MLOps strategy, contact us today.