Model drift – Mitigating Algorithmic Bias and Tackling Model and Data Drift

Model drift

Now, there are several types of model drift we should be aware of. Each tells a different tale of how our models can falter:

  • The first type is concept drift. Think of a sentiment analysis (SA) algorithm. Over time, the way people use certain words or phrases can change. Slang evolves, cultural contexts shift, and the algorithm may start to misinterpret sentiments because it’s not keeping up with these changes. This is a classic case of concept drift.
  • Next, we have prediction drift. Imagine you have a chatbot that’s handling customer queries. Due to an unforeseen event, such as a temporary outage, your chatbot suddenly receives an influx of similar queries. Your model’s predictions start to skew toward this particular issue, causing prediction drift.

To summarize, model drift is a challenge that reminds us of the ever-changing nature of data and user behavior. As we navigate these currents, understanding the types of drift can act as our compass, guiding us in maintaining the performance and accuracy of our models. Now, let’s delve into another type of drift – data drift.

Data drift

Data drift, sometimes referred to as feature drift, is another phenomenon we need to keep our eye on. Let’s imagine a scene. Your ML model is a ship, and the data it’s trained on is the ocean. Now, as we know, the ocean is not a static entity – currents shift, tides rise and fall, and new islands may even emerge. Just as a skilled captain navigates these changes, our models need to adapt to the changing currents of data.

But what exactly does data drift entail? In essence, it’s a change in the model’s input data distribution. For example, consider an e-commerce recommendation system. Suppose a new product is introduced, and it quickly becomes a hit among consumers. People start using a new term to refer to this product in their reviews and feedback. If your model doesn’t adapt to include this new term in its understanding, it’s going to miss a significant aspect of current customer sentiment and preferences – classic data drift.

Data drift is a reminder that the world our models operate in is not static. Trends emerge, customer behaviors evolve, and new information becomes relevant. As such, it’s vital for our models to stay agile and responsive to these changes.

Another kind of drift is label drift, and this is when there’s a shift in the actual label distribution. Let’s consider a customer service bot again. If there’s a change in customer behavior, such as from asking for returns to enquiring about the status of their returns, the distribution of labels in your data shifts, leading to label drift.

Now that we’ve demystified model and data drift, let’s delve into their various sources to better understand and mitigate them. When we talk about drift in the context of ML, we typically distinguish between two main sources: model drift and data drift. It’s a bit like considering the source of changes in the taste of a dish – is it the ingredients that have changed or the chef’s technique?

Sources of model drift

Model drift occurs when the underlying assumptions of our model change. This is akin to a chef changing their technique. Maybe the oven’s temperature has been altered, or the baking time has been modified. In the ML world, this could be due to changes in the environment where the model is deployed. A good example is a traffic prediction model. Let’s say the model was trained on data before a major roadway was constructed. After the construction, traffic patterns change, leading to model drift as the underlying assumptions no longer hold.

Sources of data drift

Data drift, on the other hand, is driven by changes in the statistical properties of the model inputs over time. This is like the ingredients in our dish changing. For instance, if a seasonal fruit that’s typically part of our recipe is no longer available and we have to replace it, the taste of the dish might drift from its original flavor. In the realm of ML, an example could be an SA model that fails to account for the emergence of new slang terms or emojis, leading to a drift in the data the model is analyzing.

Understanding the sources of drift is essential because it allows us to develop strategies for monitoring and mitigating these changes, ensuring that our models stay fresh and relevant in the ever-evolving real world. In the next sections, we’ll explore some strategies for managing these sources of drift.