GLUE benchmark
General Language Understanding Evaluation.
General Language Understanding Evaluation. A suite of 9 NLP tasks (sentiment, inference, question answering, etc.) used to measure general language understanding. BERT-large scored 80.5 on GLUE at publication, a large jump over the previous state-of-the-art of ~69.