Quick review — How to deal with stereotypical bias in pre-trained models

Sam Shamsan
1 min readMay 8, 2021

Since LMs are pre-trained on large real world data they capture some stereotypical bias.

For the generation task we should be very careful of generating bias outcomes.

There is a lot of work that shows and evaluates the biast, aslo another line of work to prevent the bias.

Some work on evaluating was including the designing of CAT test, context association tests. Where a word is masked and requires the model to fill it with three options (stereotype, anti-stereotype and unrelated), this is known as Intrasentence context association test.

The approach here is to collect a dataset on the target stereotypes such as gender, profession, race and religion.For each target terms do some formalationa and validation.

There are they way to asses the language model bias using this method:

  • Language model score where we prefer the original content either stereotype or not and have their probabilities higher than the unrelated option.
  • Stereotype score where we measure the ration in which the LM prefer stereotype above the non-stereotypes in 50/50
  • Idealized CAT score where we have three requirement to satisfy:

Ideal model (lms=100, ss=50) must have icat score 100

Biased model (ss=0 or 100) must have icat score 0

A random model (lms = 50, ss =50) must have icat 50

It can be computed using this formula: Icat = lms * (min(ss, 100-ss)/50)

After creating all three models, they created the sentiment language model where it allowed the most negative sentiment to be measured.

--

--