- Published on
Debiasing Word Embeddings
- Authors
- Name
- Wong Yen Hong
Just a few days ago, I was reading about word embedding in Natural Language Processing (NLP), and I found the linear algebra behind debiasing word embeddings rather interesting. I usually make my own notes for my learning, so I thought, if I am going to make my notes anyway, why not just write a blog about it and (hopefully) help people understand the idea by visualizing the linear algebra behind the algorithms? This also gave me an opportunity to play with the very Manim that 3Blue1Brown uses to animate his videos.
Without further ado, let's begin!
What is Word Embeddings?
In the field of NLP, a word embedding is a way to represent text in a way that allows machine to understand it. More specifically, each word is represented with a dimensional vector for the machine to understand it.
A word vector for the word , , , and could look something like the following:
Let's break down what we just saw above. Each of the word vectors, denoted by has dimensions, and each dimension represents the relationship between the word with a feature.
The first dimension represent the feature. We can see that the words and have , indicating a masculine feature, while and have , indicating a feminine feature.
The same logic applies to the second dimension, which represents feature. The words and have the feature, while the others do not.
Finally, we'll look at the last feature, which is . None of the words above have this feature because it's represented as .
Well, obviously, this is too good to be true. What I just showed above is a way to interpret the word vector. In the real world, words don't have such clear-cut features, and word vectors are learned by machines using learning algorithms and a large training set from the Internet. (You didn't think people are just going to manually label the feature, right?) Another thing to note is that each feature of the word vector is not really interpretable to humans in the way I just showed; they are usually a mix of many different features. For example, the first feature could be a mix of , etc.
Analogy
A good set of word embeddings that are trained using a large training set have a really awesome property, which is analogy, which we'll explore below.
If I were to ask you,
You would very easily answered . But, can machine do it as easily as you can?
The answer is yes, through the use of word embeddings. But how?
Let's dive into it.
First, we know that the meaning of the words and are the same in every feature except for one feature, . Hence, assuming we have a good set of word embeddings, we could use the word vector of , subtract the word vector of , and as a result, get another vector that represents the difference in , :
With this vector , we know that we should find a word that also has this feature difference to . Hence, we'll find a word such that:
where is the similarity function between two vectors. In this case, the cosine similarity function is most commonly used:
where is the dot product of vectors and and is the L2 norm (the magnitude of the vector) of .
This essentially computes between the angle of the two vectors and :
When two vectors are pointing in a similar direction, the cosine similarity of them is high.
In the case of finding the word such that is to it as is to , we would find as the word . This is because the only difference in features between and is the same as between to which is the difference.
Generally, to find the word such that is to it as is to , we'll use the following algorithm:
The Gender Bias in Analogy
Now that we understand how to use an algorithm to find an analogy between words, let's see how the algorithm answers the following question with a somewhat good set of word embeddings:
On a set of word embeddings that hasn't been neutralized or debiased, the result you will get is most likely one of the following:
Do you see the bias now? The word embeddings have learned stereotypical gender features for the jobs. The words listed above are gender-neutral; they shouldn't be biased toward any gender. However, there are some words that are meant to have gender distinction, like and .
Debiasing Word Embeddings
The Gender Subspace
Before we dive into the algorithm, let's take a moment to visualize the gender subspace and see how gender biases were created as a result.
Imagine you have a number line (a vector) representing the gender subspace, with points representing words. Any point with a positive value indicates a feminine feature, while a point with a negative value indicates a masculine feature. A point at has no gender feature, and is gender neutral.
Now, let's take a look at where some of the gender-biased words could be located.
To get rid of the biases, we just need to set them to the origin, like this:
And that's exactly what we will be doing.
Neutralizing
When we think of the subspace as a number line or a feature, getting rid of the gender bias on gender-neutral word is very easy; we could just simply set everything to .
However, in reality, the subspace is very unlikely to be a feature. As mentioned previously, the features in word embeddings trained on large training sets are not interpretable by humans, and a feature like could be involved in more than a single dimension. So, the question is, how are we supposed to identify the subspace?
To identify the subspace in the multi-dimensional space, we can actually do something very similar to what we did with analogies. The idea is to use a pair of words with gender distinction, for instance, . The only difference in this pair of words is the gender feature. So, if we subtract one with another, we should get a vector that points in the direction of a subspace that is very similar to the number line we saw on the last section. Below is a visualization of this process simplified in a space where is the subspace.
And, our goal is to make sure every gender-neutral word has a value of with respect to the subspace.
To make sure every gender-neutral word has a value of with respect to the subspace, it is equivalent to making sure the word vectors are othorgonal () to the subspace. Let's see how we can achieve this.
Generally, to obtain a vector that is a version of othorgonal to a vector , we'll do the following:
where is a projection of onto .
Here is a visualization of the process of finding .
In the case of debiasing a gender-neutral word , we would use the following formulas:
Conceptually, one could think of the first step as finding the bias value on the subspace and the second step as subtracting the bias value from the word vector.
And this entire process is known as Neutralizing.
Equalizing
Okay, we have neutralized the word , it is gender neutral now. But is it free from all gender biases?
To answer that, let's consider how the word vectors for could be positioned on the subspace after neutralizing .
I mean it looks fine right? is othorgonal to the subspace. What could still go wrong?
Here is what could go wrong:
Based on the word embedding shown in the diagram above, between the words and which is closer to ?
The word is obviously still closer to . Hence, the gender bias although reduced, still exist.
To debias it, we need to make sure the distance between every pair of gender-distinct words from the othorgonal subspace (to the subspace), like and is equalized.
It's okay if you don't understand what the sentence above meant; this is what it's trying to say:
In the diagram above, we can see that the word vectors and are equidistant from the othrogonal subspace (to the subspace).
The purpose is to make sure any pair of gender-distinct words are equidistant to any gender-neutralized word.
Again, this begs the question: how?
First, we need to find the mean vector of the two gender-distinct word vectors. This essentially find the vector that is equidistant to two of the vectors.
Then, we find the othrogonal version of to the subspace , denoted with . This process is the exact same to neutralization.
Semantically, is a word vector that contains all the features of the words and except the feature. The reason why this makes sense to do is that the two gender-distinct words should mean exactly the same thing, only in different genders. Hence, finding the mean and getting rid of the feature should retani its original meaning without . This word vector is currently gender-neutral and will be useful for finding the equalized word vectors later.
Now we got the word vector that contains all information of the two words except for the feature. Moving on, we need to find the equalized difference between these two words, and , so that we know how much to move from .
To do that, we'll first find the feature for both the words.
Then, we'll subtract the feature, for each of the word from their midpoint, which is to get and .
The formula shown is substituted with and (I can't fit two formulas on the screen) to obtain the two vectors. You may have noticed that the formula seems to be doing something different from what we were describing earlier, but it is actually the same thing with a bit of normalizing and scaling. Let's put it aside for now; just know that it is the scaled vector with equalized feature.
Finally, we'll add these vectors to to get the equalized word vectors.
And this is how we could equalize the distance between a pair of gender-distinct word vectors from the othorgonal subspace (to subspace), further reducing the gender bias in the word embeddings.
Let's get back to the formula for finding the equalized feature.
One thing I didn't really mention about the algorithm is that it actually assumed every word vector to have . The way I see it, the purpose of this is to make sure for each word , only the direction of the vectors matter but not the magnitude.
So, what that formula is doing is it first finds the vector with equalized distance by subtracting it from the midpoint and then normalized it, making it with magnitude .
Then, to make sure the assumption of having every word to have still holds even after equalizing, it multiples the scale needed such that after adding it to , the magnitude is . Formally, it is to make sure .
And that's all I have for debiasing word embeddings. I had a lot of fun making those animations using Manim, and it actually helped me understand the linear algebra behind the algorithms. Thank you for reading!
Reference
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings