Mitigating algorithmic bias
Even after understanding and measuring bias in ML, the job is only half done. The next logical step is to implement strategies for mitigating bias. Various techniques exist, each with its strengths and weaknesses, and a combination of these strategies can often yield the best results. Here are some of the most effective methods:
- Preprocessing techniques: These techniques involve modifying the data before inputting it into the ML model. They could include techniques such as resampling to correct imbalances in the data, or reweighing instances in the data to reduce bias.
- In-processing techniques: These are techniques that modify the model itself during training to reduce bias. They could involve regularization techniques, cost-sensitive learning, or other forms of algorithmic tweaks to minimize bias.
- Postprocessing techniques: These techniques are applied after the model has been trained. They can include modifying the outputs based on the sensitivity of predictions or adjusting the model’s thresholds to ensure fair outcomes.
- Fairness through unawareness: This method proposes that removing sensitive attributes (such as race or gender) from the dataset can lead to a fair model. However, this method can often be overly simplistic and ignore deeper, systemic biases present in the data.
- Fairness through awareness: In contrast to the previous method, this one suggests incorporating sensitive attributes directly into the model in a controlled way to counteract bias.
- Adversarial debiasing: This novel approach treats bias as a kind of noise that an adversarial network tries to remove.
Implementing these methods will be dependent on the nature of the data, the model, and the context in which they are applied. Bias mitigation is not a one-size-fits-all solution, and careful consideration must be given to each specific case. Nevertheless, the aforementioned techniques can go a long way toward promoting fairness and reducing harmful bias in ML models.
Mitigation during data preprocessing
We’ve all heard the saying, “Garbage in, garbage out,” right? Well, it’s no different with ML. What we feed our model matters, and if we feed it a biased diet, well… you can guess what comes out.
Our first line of defense against bias is during data preprocessing. Here, we have to put on our detective hats and start investigating potential biases that might lurk in our data. Say, for example, we’re dealing with a healthcare algorithm. If our data sample over-represents a particular demographic, we risk skewing the algorithm toward that group, like a toddler who only wants to eat fries!
Once we’ve identified these biases, it’s time for some spring cleaning. Techniques such as oversampling, undersampling, or using the synthetic minority oversampling technique (SMOTE) can help us achieve a more balanced training set for our model. We go through a fuller example of preprocessing with bias mitigation in mind in our case studies chapter.