Transformers explained | The architecture behind LLMs
AI Coffee Break with Letitia AI Coffee Break with Letitia
49.6K subscribers
26,107 views
1.1K

 Published On Jan 21, 2024

All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers.
9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionality at the end. Sorry for messing up the animation!

Check this out for a super cool transformer visualisation! šŸ‘ https://poloclub.github.io/transforme...

āž”ļø AI Coffee Break Merch! šŸ›ļø https://aicoffeebreak.creator-spring....

Outline:
00:00 Transformers explained
00:47 Text inputs
02:29 Image inputs
03:57 Next word prediction / Classification
06:08 The transformer layer: 1. MLP sublayer
06:47 2. Attention explained
07:57 Attention vs. self-attention
08:35 Queries, Keys, Values
09:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector).
11:26 Multi-head attention
13:04 Attention scales quadratically
13:53 Positional embeddings
15:11 Residual connections and Normalization Layers
17:09 Masked Language Modelling
17:59 Difference to RNNs

Thanks to our Patrons who support us in Tier 2, 3, 4: šŸ™
Dres. Trost GbR, Siltax, Vignesh Valliappan, ā€Ŗ@Mutual_Informationā€¬ , Kshitij

Our old Transformer explained šŸ“ŗ video: Ā Ā Ā ā€¢Ā TheĀ TransformerĀ neuralĀ networkĀ archit...Ā Ā 
šŸ“ŗ Tokenization explained: Ā Ā Ā ā€¢Ā WhatĀ isĀ tokenizationĀ andĀ howĀ doesĀ itĀ ...Ā Ā 
šŸ“ŗ Word embeddings: Ā Ā Ā ā€¢Ā HowĀ modernĀ searchĀ enginesĀ workĀ ā€“Ā Vect...Ā Ā 
šŸ“½ļø Replacing Self-Attention: Ā Ā Ā ā€¢Ā ReplacingĀ Self-attentionĀ Ā 
šŸ“½ļø Position embeddings: Ā Ā Ā ā€¢Ā PositionĀ encodingsĀ inĀ TransformersĀ ex...Ā Ā 
ā€Ŗ@SerranoAcademyā€¬ Transformer series: Ā Ā Ā ā€¢Ā TheĀ AttentionĀ MechanismĀ inĀ LargeĀ Lang...Ā Ā 

šŸ“„ Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).

ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€
šŸ”„ Optionally, pay us a coffee to help with our Coffee Bean production! ā˜•
Patreon: Ā Ā /Ā aicoffeebreakĀ Ā 
Ko-fi: https://ko-fi.com/aicoffeebreak
ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€ā–€

šŸ”— Links:
AICoffeeBreakQuiz: Ā Ā Ā /Ā aicoffeebreakĀ Ā 
Twitter: Ā Ā /Ā aicoffeebreakĀ Ā 
Reddit: Ā Ā /Ā aicoffeebreakĀ Ā 
YouTube: Ā Ā Ā /Ā aicoffeebreakĀ Ā 

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #researchā€‹
Music šŸŽµ : Sunset n Beachz - Ofshane
Video editing: Nils Trost

show more

Share/Embed