跟着李沐学AI—BERT论文精读【含研报及视频】

由qxiao创建，最终由qxiao更新于2021-11-30 03:07 被浏览 128 用户

原研报标题：<BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding>

发布时间：2018年

作者：Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova

{w:100}{w:100}{w:100}{w:100}

摘要

我们引入了一种名为BERT的语言表示模型，它代表Transformers的双向编码器表示（Bidirectional Encoder Representations）。与最近（recent）的语言表示模型（Peters et al.，2018; Radford et al.，2018）不同，BERT旨在（is designed to）通过联合调节（jointly conditioning）所有层中的左右上下文（left and right context）来预训练深度双向表示（deep bidirectional representations）。因此，只需一个额外的输出层（with just one additional output layer）就可以对预先训练的BERT表示进行微调（fine-tuned），从而为各种任务创建最先进（state-of-the-art）的模型，例如问答（question answering）和语言推理（language inference），而无需基本（substantial）的特定任务架构（task-specific architecture）修改（modifications）。

BERT在概念上（conceptually）简单且经验丰富（empirically powerful）。它在11项自然语言处理任务中获得了最新的（state-of-the-art）成果，包括将GLUE基准（benchmark）推至80.4％（绝对提高7.6％），MultiNLI准确率达到86.7％（绝对改进5.6％）和SQuAD v1.1 问题回答测试F1（Test F1）到93.2（1.5绝对提高），超过人类表现2.0%。

简介

语言模型预训练（Language model pre-training）已证明可有效（be effective for）改善许多自然语言处理任务（Dai和Le，2015; Peters等，2017，2018; Radford等，2018; Howard和Ruder，2018）。这些任务包括句子级任务（sentence-level tasks），如自然语言推理（natural language inference）（Bowman et al，2015; Williams et al，2018）和解码（paraphrasing ）（Dolan和Brockett，2005），旨在通过整体（整体的）分析来预测句子之间的关系，以及令牌级任务（token-level tasks），如命名实体识别（named entity recognition）（Tjong Kim Sang和De Meulder，2003）和SQuAD问题回答（Rajpurkar等，2016），其中模型需要在令牌级别（token-level）生成细粒度输出（grained output）。将预训练语言表示（pre-trained language representations）应用于下游任务（downstream tasks）有两种现有策略：基于特征和微调（feature-based and fine-tuning）。基于特征的方法，例如ELMo（Peters等，2018），使用特定任务的体系结构（tasks-specific architectures），其包括预先训练的表示作为附加特征（additional features）。微调方法（The fine-tuning approach），例如Generative Pre-trained Transformer（OpenAI GPT）（Radford et al，2018），引入了最小的任务特定参数（minimal task-specific parameters），并在通过简单地微调预训练参数来完成下游任务（downstream tasks）。在以前的工作中，两种方法在预训练期间共享相同的目标函数，在这些方法中，他们使用单向（unidirectional）语言模型来学习一般语言表示（general language representations）。

我们认为（We argue that）当前的技术严格限制（severely restrict）了预训练表示的能力，特别是对于微调方法（fine-tuning）。主要限制是标准语言模型是单向的（unidirectional），这限制了在预训练期间可以使用的体系结构的选择。例如，在OpenAI GPT中，作者使用从左到右架构，其中每个令牌只能处理（attended to）Transformer的自我关注层中（in the self-attention layers）的先前令牌（previous tokens）（Vaswani等，2017）。这些限制对于句子级别的任务来说是次优的（sub-optimal），并且在将基于微调的方法应用于令牌级任务（token-level）（如SQuAD问答）时可能是毁灭性（devastating ）的（Rajpurkar等，2016），其中从两个方向合并上下文至关重要（ where it is crucial to incorporate context from both directions）。

在本文中，我们通过提出BERT：Bidirectional Encoder Representations from Transformers来改进基于微调的（fine-tuning based）方法。 BERT通过提出新的预训练目标来解决前面提到的单向约束：“蒙面语言（masked language model）”（MLM），受到完形任务（Cloze task）的启发（Taylor，1953）。被掩盖的语言模型（The masked language model）从输入中随机地掩盖一些标记（tokens），并且目标是仅基于其上下文来预测被掩盖的单词的原始词汇id（the objective is to predict the original vocabulary id of the masked word based only on its context.）。与从左到右（left-to-right）的语言模型预训练不同，MLM目标允许“表示”（representation ）融合（fuse）左右上下文，这允许我们预训练深度双向变换器（deep bidirectional Transformer）。除了蒙面语言模型（masked language model），我们还引入了“下一句预测（next sentence prediction）”任务，该任务联合预先训练文本表示（we also introduce a “next sentence prediction” task that jointly pre-trains text-pair representations.）。我们的论文的贡献如下：

我们证明了（demonstrate ）双向预训练（bidirectional pre-training）对语言表示（language representations）的重要性。与Radford等人不同。（2018），其使用单向语言模型（unidirectional）进行预训练，BERT使用掩模语言模型（masked language）来实现预训练的深度双向表示（pre-trained deep bidirectional representations）。这也与Peters等人形成对比（ in contrast to ）。（2018），其使用由独立训练的左右和右到左（left-to-right）LM的浅层连接（shallow concatenation）。我们展示了预训练表示（pre-trained representations ）消除了（eliminate ）许多繁杂设计的（heavily engineered）任务特定体系结构的需求。 BERT是第一个基于微调表示模型（ fine-tuning based representation model ），它在大量句子级（a large suite of）和令牌级任务上（ token-level）实现了最先进（state-of-the-art）的性能，优于（outperforming）许多具有特定任务体系结构的系统。 BERT推进了11项NLP任务的最新技术（state-of-the-art）。我们还发现了BERT的广泛消融（extensive ablations），证明了我们模型的双向性质（bidirectional nature）是最重要的新贡献。代码和预先训练的模型将在goo.gl/language/bert.1上提供

原文PDF

/wiki/static/upload/f4/f46f936f-f37f-43c6-8099-22f0c9231882.pdf

视频解读

https://www.bilibili.com/video/BV13g411N7xy?from=search&seid=427779767100293994&spm_id_from=333.337.0.0

跟着李沐学AI—BERT论文精读【含研报及视频】

摘要

简介

原文PDF

视频解读

标签