Published On Sep 28, 2021
Swin Transformer paper explained, visualized, and animated by Ms. Coffee Bean. Find out what the Swin Transformer proposes to do better than the ViT vision transformer.
📺 ViT explained: • An image is worth 16x16 words: ViT | ...
📺 Transformer explained: • The Transformer neural network archit...
📺► Positional embeddings (playlist): • Position encodings in Transformers ex...
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
donor, Dres. Trost GbR, Yannik Schneider
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring....
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Paper discussed:
📜 Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin transformer: Hierarchical vision transformer using shifted windows." arXiv preprint arXiv:2103.14030 (2021). https://arxiv.org/abs/2103.14030
💻 Swin Transformer code on GitHub: https://github.com/microsoft/Swin-Tra...
Outline:
00:00 Problems with ViT / Swin Motivation
04:16 Swin Transformer explained
06:00 Shifted Window based Self-attention
08:58 positional embeddings in the Swin Transformer
09:29 Task performance of the Swin Transformer
Music 🎵 : Bay Street Millionaires by Squadda B
---------------------
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
YouTube: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Video and thumbnail contain emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0 16x16 pixels comprehensible artificial intelligence