Quick Review — GLUE and SuperGLUE

Sam Shamsan
2 min readMay 8, 2021

Understanding superGLUE is requiring us to understand the history of it and its previous development GLUE. GLUE is a multi-task benchmark and analysis in natural language understanding, proposed in 201. They had two goals in mind when they designed it.

1- diverse set of existing NLU tasks: demonstrating what we are learning with these tasks.

2- Transfer learning, representation learning and multi-task learning.

There are few types of tasks proposed by this platform: single task and multi-sentences tasks.

Similarity and paraphrase tasl. Inference tasks. Such as the relationship between two sentences, entailment … ect

This is a snapshot of the GLUE leaderboard that includes human evaluation along with an automated baseline as of March this year. THe automated evaluation calculates the sum of all evaluations scores over all tasks and gives a unified score. It’s worth noting that human evaluation dropped to the 15th place. However, this does not indicate that automated models are doing a better job of understating than humans. GLUE has enabled a whole new set of work in NLP. When GLUE was generated, it was beaten the human annotation within a very short time.

For the similar motivation of GLUE. Super glue came into the picture with a higher level to beat. Provided simple yet hard to game mesures of language understanding. This was limited to English, but there are similar metrics for the multilingual models as well. There is more diversity in super GLUE due to the newly added task. The metric used in superGLUE is either F1 or accuracy. Some of the superGLUE tasks include BoolQ which is a boolean question answering, CB a commitment bank that predicts the entailment of short trend, COPA a causal reasoning task “what is the cause of this?” and multi sentences reading comprehension tasks.

WiC word in context, WSC that determines the conference resolution in a binary classification fashion. Same as the GLUE, super GLUE surpassed the human level with two models including T5.

--

--