Quick review — what is T5?

Sam Shamsan
2 min readMay 8, 2021

T5 stands for Test-to-text Transfer Transformer

The idea of Ts is to transfer learning across different tasks. The main pursuit of the T5 is to create a unified way to transfer the pre-trained model by converting the NLP task in a uniform format where the input should be a sequence and the model will generate the output.

One of the main contributions of the T5 is the C4 dataset that contains more than 20 TB extracted from the web per month. For each input sequence to the T5, we include the key phrase that determines the task needed to be achieved, such as translate English to Arabic, or summarize for summarization task. It worth noting that we could have more than one key phrase per the input, depending on the task. For instance, the premise and hypothesis in NLI task. T5 uses an encoder decoder approach with some modification in dorout and simple layer normalization. And they also use restricted measures on data cleaning such as:

  • Keep only line that end with punctuation marks
  • Remove toxic test and text with curly parenthesis
  • Remove duplicate sentences or sentence with less that 3 words
  • Remove page with less than 5 sentences

Remove tokens and fill the blank was one of the major tasks that has been trained on T5, dropping tokens has been done randomly. T5 looked into different ways to estimate the attention, they applied three attention patterns: fully visible in the encoder decoder model, casual for the language model where every token pays attention to the next token, and casual with prefix for prefix language model. They also experimented with GLUE, SuperGLUE, SQAD and MT datasets and more than 11 billion parameters.

--

--