当前位置:   article > 正文

Roberta 源码阅读_roberta代码

roberta代码
  1. tokenizer = RobertaTokenizer.from_pretrained('microsoft/codebert-base-mlm')
  2. model = RobertaForMaskedLM.from_pretrained('microsoft/codebert-base-mlm')
  3. input_ids = tokenizer(['Language model is what I need.','I love China'],padding=True,return_tensors='pt')
  4. out = model(**input_ids)
wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==

Start: modeling_roberta.py 当中的 RobertaForMaskedLM 

 进入 RobertaModel 当中, 首先第一个模块是RobertEmbedding, 也就是将词进行 embedding,由此进入A:首先第一个模块是 RobertEmbedding, 这里的embedding_out = inputs_embeds + position_embeddings + token_type_embeddings,embedding_out.shpe(batch_size * sequence * hidden_size) 有关参数的解释可见:【NLP】Transformers 源码阅读和实践_fengdu78的博客-CSDN博客, Nice blog.

 B: 接着是: RobertaEncoder

  1. # BertEncoder由12层BertLayer构成
  2. self.layer = nn.ModuleList([BertLayer(config) for _ in range(config.num_hidden_layers)])

 B.1:随后进入  RobertaLayer,并由此进入 RobertaAttention

 B.2:  由  RobertaAttention  中的 self.self() 进入到 RobertaSelfAttention: BertAttention是上述代码中attention实例对应的类,是transformer进行self-attention的核心类。包括了BertSelfAttention和BertSelfOutput成员。

  1. class RobertaAttention(nn.Module):
  2. def __init__(self, config):
  3. super().__init__()
  4. self.self = RobertaSelfAttention(config)
  5. self.output = RobertaSelfOutput(config)
  6. self.pruned_heads = set()
  7. def prune_heads(self, heads):
  8. if len(heads) == 0:
  9. return
  10. heads, index = find_pruneable_heads_and_indices(
  11. heads, self.self.num_attention_heads, self.self.attention_head_size, self.pruned_heads
  12. )
  13. # Prune linear layers
  14. self.self.query = prune_linear_layer(self.self.query, index)
  15. self.self.key = prune_linear_layer(self.self.key, index)
  16. self.self.value = prune_linear_layer(self.self.value, index)
  17. self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
  18. # Update hyper params and store pruned heads
  19. self.self.num_attention_heads = self.self.num_attention_heads - len(heads)
  20. self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
  21. self.pruned_heads = self.pruned_heads.union(heads)
  22. def forward(
  23. self,
  24. hidden_states,
  25. attention_mask=None,
  26. head_mask=None,
  27. encoder_hidden_states=None,
  28. encoder_attention_mask=None,
  29. output_attentions=False,
  30. ):
  31. # 从这里进入 RobertaSelfAttention
  32. self_outputs = self.self(
  33. hidden_states,
  34. attention_mask,
  35. head_mask,
  36. encoder_hidden_states,
  37. encoder_attention_mask,
  38. output_attentions,
  39. )
  40. attention_output = self.output(self_outputs[0], hidden_states)
  41. outputs = (attention_output,) + self_outputs[1:] # add attentions if we output them
  42. return outputs

B.3: 由 RobertaSelfAttention 处计算 Q,K,V, 并返回output

 C. 得到 encoder 的输出向量,是len==1的tuple,所以 sequence= encoder_outputs[0]  维度为: batch_size * sequence_length * hidden_size(768)

 D. RobertaPooler: 但是此处的 pooled_output = None, 并不会使用RobertaPooler. 应为我这里加载的 model 是CodeBert,special token 是<s> 而非Bert当中的[CLS].  如果将加载的模型换成 RobertaModel 则会有一个 pooled_output的输出。维度为:batch_size * hidden_size.

  1. model = RobertaModel.from_pretrained('roberta-base')
  2. tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
  3. input_ids = tokenizer("The [MASK] on top allows for less material higher up the pyramid.", return_tensors='pt')['input_ids']
  4. vector1,pooler1 = model(input_ids)
  5. print('pooler2:', pooler1) # shape: 1 * hidden_size (因为这里只有一句话)
  6. print('vector2[:,0:1,:]:', vector1[:, 0:1, :]) # shape: batch_size * sequence_length * hidden_size

 E. 回到 开头的 RobertaForMaskedLM 当中  此时的 sequence_output 的shape 依旧是:batch_size * sequence_length * hidden_size.  然后 将得到的sequence_out 输入到 self.lm_head() 当中。

 F. 将得到的值 输入到 RobertaLMHead() 当中, 在最后输出的时候 x = self.decoder(x) 的shape 为: batch_size * sequence_length  * vocab_size.

 G. 最后一步的输出,竟然是 prediction,是因为我的输入没有<mask> 吗?

  1. tokenizer = RobertaTokenizer.from_pretrained('microsoft/codebert-base-mlm')
  2. model = RobertaForMaskedLM.from_pretrained('microsoft/codebert-base-mlm')
  3. input_ids = tokenizer(['Language model is what I need.','I love China'],padding=True,return_tensors='pt')
  4. out = model(**input_ids) # 这个是上图当中的prediction

 这是 output 的截图:

最后 贴上一个BERT的网络架构图:

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/天景科技苑/article/detail/835463
推荐阅读
相关标签
  

闽ICP备14008679号