Transformer模型的一些学习参考资料

1. 第一当然是论文Attention Is All You Need了。经典论文,必看。

2.李沐博士对论文Attention Is All You Need的讲解视频:Transformer论文逐段精读

3.《动手学深度学习》,李沐博士也是作者之一。

4.AI大神Andrej Karpathy手把手讲解一个nanoGPT代码:Let’s build GPT: from scratch, in code, spelled out.
代码所在的Google Colab文件:https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing
Andrej Karpathy对Attention的解释很经典:’Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights.‘

5.Tensorflow官网上的Transformer教程

6. Drawing the Transformer Network from Scratch,作者Thomas Kurbiel。可视化动态展示。

7.Transformers Explained Visually,作者Ketan Doshi。这一系列四篇文章对Transformer的解释很到位,推荐指数5颗星

发表评论

您的邮箱地址不会被公开。 必填项已用 * 标注