Uncovering bias in GPT-2
LLMs such as GPT-2 have revolutionized natural language processing (NLP) by producing high-quality, human-like text. While these models can be valuable tools, they are not without their challenges. One significant concern is the potential for bias in their outputs, which can result from biases in the training data. To understand this concept in practice, let’s consider an example. We’ll use the GPT-2 model from Hugging Face’s transformers library, trained on data from various internet sources, including Reddit, a platform known for its diverse array of opinions and discussions but also for misinformation and potential biases. This experiment aims to showcase the potential biases that LLMs can exhibit when generating text based on specific prompts.
Let’s set up some code for an experiment. I want to ask GPT-2 a few questions where I expect a list as an answer, and I want to see what it says and how often it will say it. This code block creates a function to ask a question and counts the elements of a comma-separated list in the response:
from tqdm import tqdm
import pandas as pd
from transformers import pipeline, set_seed
generator = pipeline(‘text-generation’, model=’gpt2-large’, tokenizer=’gpt2-large’)
set_seed(0)
Let’s look at this in more detail:
- The pipeline is a high-level, easy-to-use API for doing inference with transformer models. The set_seed function sets the seed for generating random numbers.
- Next, we create an instance of the text-generation pipeline. We set up a pipeline for text generation, specifying the GPT-2 model and tokenizer. The GPT-2 model is used because it’s been trained on a large corpus of text and is able to generate human-like text.
- We then set the seed for the random number generator. The seed is set to ensure the reproducibility of the results. When the seed is set to a specific number, the generated sequences will be the same every time this script is run.
- Finally, we use the generator to create text. The generator receives a prompt and spits back a response:
- It generates multiple different continuations of the prompt per call (controlled by the num_return_sequences parameter).
- The max_length parameter restricts the total length of the generated text to 10 tokens.
- The temperature is set to 1.0 and affects the randomness of the output (higher values make the output more random, and lower values make it more deterministic).
Figure 13.1 shows some top results for a few prompts:

Figure 13.1 – Differences in asking GPT-2 what kinds of jobs men and women hold
These outputs clearly show that there may be biases in the language model’s output. Some of the generated sentences could be perceived as negative or stereotyping, demonstrating that there could be potential for bias in LLMs. Therefore, it is crucial to manage and be aware of these biases, especially when using the model’s output for any sensitive tasks.
Language models such as GPT-2 are trained on large amounts of text data. They learn to generate text by predicting the next word in a sentence, given the context of the preceding words. This learning process doesn’t include explicit information about the facts or morality of the statements; it just learns patterns from the data it’s trained on.
Biases in these models arise due to the nature of the data they are trained on. In the case of GPT-2, a significant portion of its training data comes from websites such as Reddit. While Reddit can be a rich source of diverse views and discussions, it is also a platform that contains a wide range of content, including misinformation, stereotypes, and discriminatory language.
So, when the model is trained on such data, it can potentially learn and replicate these biases. For example, in the code we provided, some generated sentences could be seen as promoting stereotypes or misinformation. This demonstrates that the model has possibly learned these patterns from the biases present in its training data.
This has serious implications. For example, if LLMs are used in applications that involve making decisions that impact people’s lives, such as job recruitment or loan approval, any bias in the model’s predictions could lead to unfair outcomes.
Therefore, addressing these biases is a significant challenge in the deployment of LLMs. Possible solutions could involve careful curation of training data, bias mitigation techniques during the training process, or the postprocessing of model outputs. Additionally, understanding and communicating the potential biases of a model to its users is also a crucial part of responsible AI deployment.