Published on

Debiasing Word Embeddings

  • avatar
    Wong Yen Hong

Just a few days ago, I was reading about word embedding in Natural Language Processing (NLP), and I found the linear algebra behind debiasing word embeddings rather interesting. I usually make my own notes for my learning, so I thought, if I am going to make my notes anyway, why not just write a blog about it and (hopefully) help people understand the idea by visualizing the linear algebra behind the algorithms? This also gave me an opportunity to play with the very Manim that 3Blue1Brown uses to animate his videos.

Without further ado, let's begin!

What is Word Embeddings?

In the field of NLP, a word embedding is a way to represent text in a way that allows machine to understand it. More specifically, each word ww is represented with a dd dimensional vector wRd\overrightarrow{w} \in \mathbb{R}^d for the machine to understand it.

A word vector w\overrightarrow{w} for the word man\text{man}, woman\text{woman}, king\text{king}, and queen\text{queen} could look something like the following:

wman=[xgender=1xroyale=0xnumber=0]wwoman=[xgender=1xroyale=0xnumber=0]wking=[xgender=1xroyale=1xnumber=0]wqueen=[xgender=1xroyale=1xnumber=0]\begin{align*} \overrightarrow{w}_{\text{man}} &= \begin{bmatrix} x_{\text{gender}} = -1 \\ x_{\text{royale}} = 0 \\ \vdots \\ x_{\text{number}} = 0 \end{bmatrix} \\ \\ \overrightarrow{w}_{\text{woman}} &= \begin{bmatrix} x_{\text{gender}} = 1 \\ x_{\text{royale}} = 0 \\ \vdots \\ x_{\text{number}} = 0 \end{bmatrix} \\ \\ \overrightarrow{w}_{\text{king}} &= \begin{bmatrix} x_{\text{gender}} = -1 \\ x_{\text{royale}} = 1 \\ \vdots \\ x_{\text{number}} = 0 \end{bmatrix} \\ \\ \overrightarrow{w}_{\text{queen}} &= \begin{bmatrix} x_{\text{gender}} = 1 \\ x_{\text{royale}} = 1 \\ \vdots \\ x_{\text{number}} = 0 \end{bmatrix} \\ \\ \end{align*}

Let's break down what we just saw above. Each of the word vectors, denoted by w\overrightarrow{w} has dd dimensions, and each dimension represents the relationship between the word ww with a feature.

The first dimension represent the gender\text{gender} feature. We can see that the words man\text{man} and king\text{king} have xgender=1x_{\text{gender}} = 1, indicating a masculine feature, while woman\text{woman} and queen\text{queen} have xgender=1x_{\text{gender}} = -1, indicating a feminine feature.

The same logic applies to the second dimension, which represents royale\text{royale} feature. The words king\text{king} and queen\text{queen} have the royale\text{royale} feature, while the others do not.

Finally, we'll look at the last feature, which is number\text{number}. None of the words above have this feature because it's represented as 00.

Well, obviously, this is too good to be true. What I just showed above is a way to interpret the word vector. In the real world, words don't have such clear-cut features, and word vectors are learned by machines using learning algorithms and a large training set from the Internet. (You didn't think people are just going to manually label the feature, right?) Another thing to note is that each feature of the word vector is not really interpretable to humans in the way I just showed; they are usually a mix of many different features. For example, the first feature x1x_1 could be a mix of gender, royale, number\text{gender, royale, number}, etc.


A good set of word embeddings that are trained using a large training set have a really awesome property, which is analogy, which we'll explore below.

If I were to ask you,

man is to woman as king to what?\begin{align*} \text{man is to woman as king to what?} \end{align*}

You would very easily answered queen\text{queen}. But, can machine do it as easily as you can?

The answer is yes, through the use of word embeddings. But how?

Let's dive into it.

First, we know that the meaning of the words man\text{man} and woman\text{woman} are the same in every feature except for one feature, gender\text{gender}. Hence, assuming we have a good set of word embeddings, we could use the word vector of man\text{man}, wman\overrightarrow{w}_{\text{man}} subtract the word vector of woman\text{woman}, wwoman\overrightarrow{w}_{\text{woman}} and as a result, get another vector that represents the difference in gender\text{gender}, d\overrightarrow{d}:

d=wmanwwoman\begin{align*} \overrightarrow{d} = \overrightarrow{w}_{\text{man}}- \overrightarrow{w}_{\text{woman}} \end{align*}

With this vector d\overrightarrow{d}, we know that we should find a word ww that also has this gender\text{gender} feature difference d\overrightarrow{d} to wking\overrightarrow{w}_{\text{king}}. Hence, we'll find a word ww such that:

w=maxwsim(d,wkingww)\begin{align*} w = \max_{w} \text{sim}(\overrightarrow{d}, \overrightarrow{w}_{\text{king}} - \overrightarrow{w}_{w}) \end{align*}

where sim\text{sim} is the similarity function between two vectors. In this case, the cosine similarity function cos(u,v)\cos(\overrightarrow{u}, \overrightarrow{v}) is most commonly used:

cos(u,v)=uvuv\begin{align*} \cos(\overrightarrow{u}, \overrightarrow{v}) = \frac{\overrightarrow{u} \cdot \overrightarrow{v}}{\| \overrightarrow{u} \| \|\overrightarrow{v} \|} \end{align*}

where uv\overrightarrow{u} \cdot \overrightarrow{v} is the dot product of vectors u\overrightarrow{u} and v\overrightarrow{v} and u\| \overrightarrow{u} \| is the L2 norm (the magnitude of the vector) of u\overrightarrow{u}.

This essentially computes cosθ\cos \theta between the angle of the two vectors u\overrightarrow{u} and v\overrightarrow{v}:

When two vectors are pointing in a similar direction, the cosine similarity of them is high.

In the case of finding the word ww such that king\text{king} is to it as man\text{man} is to woman\text{woman}, we would find queen\text{queen} as the word ww. This is because the only difference in features between king\text{king} and queen\text{queen} is the same as between man\text{man} to woman\text{woman} which is the gender\text{gender} difference.

Generally, to find the word ww such that w3w_3 is to it as w1w_1 is to w2w_2, we'll use the following algorithm:

w=maxwsim(w1w2,w3ww)\begin{align*} w = \max_{w} \text{sim}(\overrightarrow{w_1} - \overrightarrow{w_2}, \overrightarrow{w}_{3} - \overrightarrow{w}_{w}) \end{align*}

The Gender Bias in Analogy

Now that we understand how to use an algorithm to find an analogy between words, let's see how the algorithm answers the following question with a somewhat good set of word embeddings:

man is to woman as computer programmer to what?\begin{align*} \text{man is to woman as computer programmer to what?} \end{align*}

On a set of word embeddings that hasn't been neutralized or debiased, the result you will get is most likely one of the following:

1. housemaker2. nurse3. hairdresser4. nanny5. stylist\begin{align*} &\text{1. housemaker}\\ &\text{2. nurse}\\ &\text{3. hairdresser}\\ &\text{4. nanny}\\ &\text{5. stylist}\\ \end{align*}

Do you see the bias now? The word embeddings have learned stereotypical gender features for the jobs. The words listed above are gender-neutral; they shouldn't be biased toward any gender. However, there are some words that are meant to have gender distinction, like man\text{man} and woman\text{woman}.

Debiasing Word Embeddings

The Gender Subspace

Before we dive into the algorithm, let's take a moment to visualize the gender subspace and see how gender biases were created as a result.

Imagine you have a number line (a 1D1D vector) representing the gender subspace, with points representing words. Any point with a positive value indicates a feminine feature, while a point with a negative value indicates a masculine feature. A point at 00 has no gender feature, and is gender neutral.

Now, let's take a look at where some of the gender-biased words could be located.

To get rid of the biases, we just need to set them to the origin, like this:

And that's exactly what we will be doing.


When we think of the gender\text{gender} subspace as a number line or a 1D1D feature, getting rid of the gender bias on gender-neutral word is very easy; we could just simply set everything to 00.

However, in reality, the gender\text{gender} subspace is very unlikely to be a 1D1D feature. As mentioned previously, the features in word embeddings trained on large training sets are not interpretable by humans, and a feature like gender\text{gender} could be involved in more than a single dimension. So, the question is, how are we supposed to identify the gender\text{gender} subspace?

To identify the gender\text{gender} subspace in the multi-dimensional space, we can actually do something very similar to what we did with analogies. The idea is to use a pair of words with gender distinction, for instance, (man,woman)(\text{man}, \text{woman}). The only difference in this pair of words is the gender feature. So, if we subtract one with another, we should get a vector that points in the direction of a gender\text{gender} subspace that is very similar to the 1D1D number line we saw on the last section. Below is a visualization of this process simplified in a 2D2D space where gg is the gender\text{gender} subspace.

And, our goal is to make sure every gender-neutral word has a value of 00 with respect to the gender\text{gender} subspace.

To make sure every gender-neutral word has a value of 00 with respect to the gender\text{gender} subspace, it is equivalent to making sure the word vectors are othorgonal (9090^\circ) to the gender\text{gender} subspace. Let's see how we can achieve this.

Generally, to obtain a vector u\overrightarrow{u}_\perp that is a version of u\overrightarrow{u} othorgonal to a vector v\overrightarrow{v}, we'll do the following:

pu,v=uvv2vu=upu,v\begin{align*} \overrightarrow{p}_{u, v} &= \frac{\overrightarrow{u} \cdot \overrightarrow{v}}{\|v\|^2} * \overrightarrow{v}\\ \overrightarrow{u}_\perp &= \overrightarrow{u} - \overrightarrow{p}_{u, v} \end{align*}

where pu,v\overrightarrow{p}_{u, v} is a projection of u\overrightarrow{u} onto v\overrightarrow{v}.

Here is a visualization of the process of finding u\overrightarrow{u}_\perp.

In the case of debiasing a gender-neutral word ww, we would use the following formulas:

pw,g=wgg2gw=wpw,g\begin{align*} \overrightarrow{p}_{w, g} &= \frac{\overrightarrow{w} \cdot \overrightarrow{g}}{\|g\|^2} * \overrightarrow{g}\\ \overrightarrow{w}_\perp &= \overrightarrow{w} - \overrightarrow{p}_{w, g} \end{align*}

Conceptually, one could think of the first step as finding the bias value on the gender\text{gender} subspace and the second step as subtracting the bias value from the word vector.

And this entire process is known as Neutralizing.


Okay, we have neutralized the word programmer\text{programmer}, it is gender neutral now. But is it free from all gender biases?

To answer that, let's consider how the word vectors for programmer,man,woman\text{programmer}, \text{man}, \text{woman} could be positioned on the gender\text{gender} subspace after neutralizing wprogrammer\overrightarrow{w}_{\text{programmer}}.

I mean it looks fine right? wprogrammer\overrightarrow{w}_{\text{programmer}} is othorgonal to the gender\text{gender} subspace. What could still go wrong?

Here is what could go wrong:

Based on the word embedding shown in the diagram above, between the words man\text{man} and woman\text{woman} which is closer to programmer\text{programmer}?

The word man\text{man} is obviously still closer to programmer\text{programmer}. Hence, the gender bias although reduced, still exist.

To debias it, we need to make sure the distance between every pair of gender-distinct words from the othorgonal subspace (to the gender\text{gender} subspace), like (man,woman)(\text{man}, \text{woman}) and (guy,gal)(\text{guy}, \text{gal}) is equalized.

It's okay if you don't understand what the sentence above meant; this is what it's trying to say:

In the diagram above, we can see that the word vectors wman\overrightarrow{w}_{\text{man}} and wwoman\overrightarrow{w}_{\text{woman}} are equidistant from the othrogonal subspace (to the gender\text{gender} subspace).

The purpose is to make sure any pair of gender-distinct words are equidistant to any gender-neutralized word.

Again, this begs the question: how?

First, we need to find the mean vector μ\overrightarrow{\mu} of the two gender-distinct word vectors. This essentially find the vector that is equidistant to two of the vectors.

Then, we find the othrogonal version of μ\overrightarrow{\mu} to the gender\text{gender} subspace gg, denoted with μ\overrightarrow{\mu}_\perp. This process is the exact same to neutralization.

Semantically, μ\overrightarrow{\mu}_\perp is a word vector that contains all the features of the words man\text{man} and woman\text{woman} except the gender\text{gender} feature. The reason why this makes sense to do is that the two gender-distinct words should mean exactly the same thing, only in different genders. Hence, finding the mean and getting rid of the gender\text{gender} feature should retani its original meaning without gender\text{gender}. This word vector is currently gender-neutral and will be useful for finding the equalized word vectors later.

Now we got the word vector that contains all information of the two words except for the gender\text{gender} feature. Moving on, we need to find the equalized gender\text{gender} difference between these two words, dman\overrightarrow{d}_{\text{man}} and dwoman\overrightarrow{d}_{\text{woman}}, so that we know how much to move from μ\overrightarrow{\mu}_\perp.

To do that, we'll first find the gender\text{gender} feature for both the words.

Then, we'll subtract the gender\text{gender} feature, pw,g\overrightarrow{p}_{w, g} for each of the word from their midpoint, which is pμ,g\overrightarrow{p}_{\mu, g} to get dman\overrightarrow{d}_{\text{man}} and dwoman\overrightarrow{d}_{\text{woman}}.

The formula shown is substituted with pman,g\overrightarrow{p}_{\text{man}, g} and pwoman,g\overrightarrow{p}_{\text{woman}, g} (I can't fit two formulas on the screen) to obtain the two vectors. You may have noticed that the formula seems to be doing something different from what we were describing earlier, but it is actually the same thing with a bit of normalizing and scaling. Let's put it aside for now; just know that it is the scaled vector with equalized gender\text{gender} feature.

Finally, we'll add these vectors to μ\overrightarrow{\mu}_\perp to get the equalized word vectors.

And this is how we could equalize the distance between a pair of gender-distinct word vectors from the othorgonal subspace (to gender\text{gender} subspace), further reducing the gender bias in the word embeddings.

Let's get back to the formula for finding the equalized gender\text{gender} feature.

dw=1μ2pw,gpμ,gpw,gpμ,g\begin{align*} \overrightarrow{d}_{\text{w}} = \sqrt{1 - \| \overrightarrow{\mu}_\perp \|^2}\frac{\overrightarrow{p}_{\text{w}, g} - \overrightarrow{p}_{\mu, g}}{\|\overrightarrow{p}_{\text{w}, g} - \overrightarrow{p}_{\mu, g}\|} \end{align*}

One thing I didn't really mention about the algorithm is that it actually assumed every word vector w\overrightarrow{w} to have w=1\|\overrightarrow{w}\| = 1. The way I see it, the purpose of this is to make sure for each word ww, only the direction of the vectors matter but not the magnitude.

pw,gpμ,gpw,gpμ,g\begin{align*} \frac{\overrightarrow{p}_{\text{w}, g} - \overrightarrow{p}_{\mu, g}}{\|\overrightarrow{p}_{\text{w}, g} - \overrightarrow{p}_{\mu, g}\|} \end{align*}

So, what that formula is doing is it first finds the vector with equalized distance by subtracting it from the midpoint and then normalized it, making it with magnitude 11.

1μ2pw,gpμ,gpw,gpμ,g\begin{align*} \sqrt{1 - \| \overrightarrow{\mu}_\perp \|^2}\frac{\overrightarrow{p}_{\text{w}, g} - \overrightarrow{p}_{\mu, g}}{\|\overrightarrow{p}_{\text{w}, g} - \overrightarrow{p}_{\mu, g}\|} \end{align*}

Then, to make sure the assumption of having every word ww to have w=1\|\overrightarrow{w}\| = 1 still holds even after equalizing, it multiples the scale needed such that after adding it to μ\overrightarrow{\mu}_\perp, the magnitude is 11. Formally, it is to make sure μ+dw=1\| \overrightarrow{\mu}_\perp + \overrightarrow{d}_{\text{w}} \| = 1.

And that's all I have for debiasing word embeddings. I had a lot of fun making those animations using Manim, and it actually helped me understand the linear algebra behind the algorithms. Thank you for reading!


Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings