{"id":2899,"date":"2023-05-03T14:56:09","date_gmt":"2023-05-03T06:56:09","guid":{"rendered":"https:\/\/nullthought.net\/?p=2899"},"modified":"2025-02-23T11:08:04","modified_gmt":"2025-02-23T03:08:04","slug":"transformer%e6%a8%a1%e5%9e%8b%e7%9a%84%e4%b8%80%e4%ba%9b%e5%ad%a6%e4%b9%a0%e5%8f%82%e8%80%83%e8%b5%84%e6%96%99","status":"publish","type":"post","link":"https:\/\/nullthought.net\/?p=2899","title":{"rendered":"Transformer\u6a21\u578b\u7684\u4e00\u4e9b\u5b66\u4e60\u53c2\u8003\u8d44\u6599"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"733\" src=\"https:\/\/nullthought.net\/wp-content\/uploads\/2023\/05\/The-Transformer-model-architecture.png\" alt=\"\" class=\"wp-image-2900\" srcset=\"https:\/\/nullthought.net\/wp-content\/uploads\/2023\/05\/The-Transformer-model-architecture.png 500w, https:\/\/nullthought.net\/wp-content\/uploads\/2023\/05\/The-Transformer-model-architecture-205x300.png 205w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/figure>\n\n\n\n<p>1. \u7b2c\u4e00\u5f53\u7136\u662f\u8bba\u6587<a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1706.03762\" target=\"_blank\"><em><strong>Attention Is All You Need<\/strong><\/em><\/a>\u4e86\u3002\u7ecf\u5178\u8bba\u6587\uff0c\u5fc5\u770b\u3002<\/p>\n\n\n\n<p>2.<a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/@mu_li\" target=\"_blank\">\u674e\u6c90<\/a>\u535a\u58eb\u5bf9\u8bba\u6587<em>Attention Is All You Need<\/em>\u7684\u8bb2\u89e3\u89c6\u9891\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/watch?v=nzqlFIcCSWQ\" target=\"_blank\">Transformer\u8bba\u6587\u9010\u6bb5\u7cbe\u8bfb<\/a>\u3002<\/p>\n\n\n\n<p>3.<a href=\"https:\/\/zh.d2l.ai\/chapter_attention-mechanisms\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">\u300a\u52a8\u624b\u5b66\u6df1\u5ea6\u5b66\u4e60\u300b<\/a>\uff0c\u674e\u6c90\u535a\u58eb\u4e5f\u662f\u4f5c\u8005\u4e4b\u4e00\u3002<\/p>\n\n\n\n<p>4.AI\u5927\u795e<a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/@AndrejKarpathy\" target=\"_blank\">Andrej Karpathy<\/a>\u624b\u628a\u624b\u8bb2\u89e3\u4e00\u4e2ananoGPT\u4ee3\u7801\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/watch?v=kCc8FmEb1nY\" target=\"_blank\">Let&#8217;s build GPT: from scratch, in code, spelled out<\/a>.<br>\u4ee3\u7801\u6240\u5728\u7684Google Colab\u6587\u4ef6\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/colab.research.google.com\/drive\/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing\" target=\"_blank\">https:\/\/colab.research.google.com\/drive\/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing<\/a><br>Andrej Karpathy\u5bf9Attention\u7684\u89e3\u91ca\u5f88\u7ecf\u5178\uff1a\u2019Attention is a communication mechanism. <strong>Can be seen as nodes in a directed graph looking at each other<\/strong> and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights.\u2018<\/p>\n\n\n\n<p>5.Tensorflow\u5b98\u7f51\u4e0a\u7684<a href=\"https:\/\/www.tensorflow.org\/text\/tutorials\/transformer\" target=\"_blank\" rel=\"noreferrer noopener\">Transformer\u6559\u7a0b<\/a>\u3002<\/p>\n\n\n\n<p>6. <a href=\"https:\/\/towardsdatascience.com\/drawing-the-transformer-network-from-scratch-part-1-9269ed9a2c5e\" target=\"_blank\" rel=\"noreferrer noopener\">Drawing the Transformer Network from Scratch<\/a>\uff0c\u4f5c\u8005<a href=\"https:\/\/tkurbiel.medium.com\" target=\"_blank\" rel=\"noreferrer noopener\">Thomas Kurbiel<\/a>\u3002\u53ef\u89c6\u5316\u52a8\u6001\u5c55\u793a\u3002<\/p>\n\n\n\n<p>7.<a href=\"https:\/\/towardsdatascience.com\/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452\" target=\"_blank\" rel=\"noreferrer noopener\">Transformers Explained Visually<\/a>\uff0c\u4f5c\u8005<a href=\"https:\/\/ketanhdoshi.medium.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Ketan Doshi<\/a>\u3002\u8fd9\u4e00\u7cfb\u5217\u56db\u7bc7\u6587\u7ae0\u5bf9Transformer\u7684\u89e3\u91ca\u5f88\u5230\u4f4d\uff0c<strong>\u63a8\u8350\u6307\u65705\u9897\u661f<\/strong>\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. \u7b2c\u4e00\u5f53\u7136\u662f\u8bba\u6587Attention Is All You Need\u4e86\u3002\u7ecf\u5178\u8bba\u6587\uff0c\u5fc5\u770b\u3002 2.\u674e\u6c90\u535a\u58eb\u5bf9\u8bba\u6587 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[3,8],"tags":[39,95,66],"class_list":["post-2899","post","type-post","status-publish","format-standard","hentry","category-it","category-tech","tag-ai","tag-transformer","tag-google"],"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"NullThought","author_link":"https:\/\/nullthought.net\/?author=1"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/nullthought.net\/?cat=3\" rel=\"category\">IT<\/a> <a href=\"https:\/\/nullthought.net\/?cat=8\" rel=\"category\">Tech<\/a>","rttpg_excerpt":"1. \u7b2c\u4e00\u5f53\u7136\u662f\u8bba\u6587Attention Is All You Need\u4e86\u3002\u7ecf\u5178\u8bba\u6587\uff0c\u5fc5\u770b\u3002 2.\u674e\u6c90\u535a\u58eb\u5bf9\u8bba\u6587&hellip;","_links":{"self":[{"href":"https:\/\/nullthought.net\/index.php?rest_route=\/wp\/v2\/posts\/2899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nullthought.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nullthought.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nullthought.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nullthought.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2899"}],"version-history":[{"count":2,"href":"https:\/\/nullthought.net\/index.php?rest_route=\/wp\/v2\/posts\/2899\/revisions"}],"predecessor-version":[{"id":5009,"href":"https:\/\/nullthought.net\/index.php?rest_route=\/wp\/v2\/posts\/2899\/revisions\/5009"}],"wp:attachment":[{"href":"https:\/\/nullthought.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nullthought.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nullthought.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}