Mitigation during model in-processing
So, we’ve done our best to clean up our data, but what about when we’re training our model? Can we do something there too? Absolutely!
The model in-processing stage allows us to bake fairness right into the training process. It’s a bit like adding spices to a dish as it cooks, ensuring the flavors permeate the entire dish.
We can use algorithmic fairness constraints during the training to make sure our model plays fair. Take a loan approval algorithm, for instance. We could introduce a fairness constraint to ensure that approval rates are similar across different demographic groups, much like making sure everyone at the table gets an equal slice of pizza.
Or, we could use fairness regularization, where we introduce fairness as a sort of spicy ingredient into the loss function. This can help us strike a balance between accuracy and fairness, preventing our model from favoring the majority group in a dataset, much like avoiding that one spicy dish that only a few guests at the party enjoy.
Finally, we can use adversarial debiasing, where we train an adversarial network to learn fair representations. It’s like having a little kitchen helper who’s making sure we don’t overuse a particular ingredient (such as our sensitive attribute) while cooking our model.
Mitigation during model postprocessing
Alright – so we’ve prepped our data and been careful during cooking, but what about after the meal is cooked? Can we do anything then? Of course we can!
Just like we might adjust the seasoning of a dish after tasting, we can calibrate our models after they’re trained. This ensures our model’s prediction probabilities are equally flavorful across different demographic groups.
And if we find our model consistently scoring a minority group lower, we can adjust the decision threshold for that group. It’s like lowering the bar for a high jump when you realize it’s unfairly high for some participants.
Also, we can use fairness-aware ensemble methods. These are like a group of chefs, each focusing on a different part of a meal, thus ensuring the entire dining experience is well balanced and fair.
Bias in LLMs
In the world of AI, we’ve seen a boom in the deployment of LLMs, and hey – why not? These behemoths, such as GPT-3 or BERT, are capable of some jaw-dropping tasks, from writing emails that make sense to creating near-human-like text. Impressive, isn’t it? But let’s take a step back and think. Just like every coin has two sides, there’s a not-so-glamorous side to these models – bias.
Yes – you heard it right. These models are not immune to biases. The ugly truth is that these models learn everything from the data they’re trained on. And if that data has biases (which, unfortunately, is often the case), the model’s output can also be biased. Think of it this way: if the model were trained on texts that are predominantly sexist or racist, it might end up generating content that reflects these biases. Not a pleasant thought, is it?
And that’s not just a hypothetical scenario. There have been instances where AI applications based on these models ended up causing serious trouble. Remember when AI systems were caught making unfair decisions or when chatbots ended up spewing hate speech? That’s the direct result of biases in the training data trickling down to the AI application.
Take a look at a few recent studies, and you’ll find them dotted with examples of biases in LLMs. It’s like finding a needle in a haystack, but the needle is magnetized. Bias, dear friends, is more prevalent in these models than we’d like to admit.
In a nutshell, bias in LLMs is real, and it’s high time we started addressing it seriously. Stay tuned as we unpack more on this. Let’s take a look at an example.