Gabriel Mongaras
8.89K subscribers
32:31
Round and Round We Go! What makes Rotary Positional Encodings useful?
Gabriel Mongaras
397 views • 4 days ago
1:13:10
Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt
Gabriel Mongaras
1.1K views • 2 months ago
42:25
Attending to Topological Spaces: The Cellular Transformer
Gabriel Mongaras
667 views • 3 months ago
35:52
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Gabriel Mongaras
2.5K views • 3 months ago
52:39
WARP: On the Benefits of Weight Averaged Rewarded Policies
Gabriel Mongaras
737 views • 3 months ago
28:52
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Gabriel Mongaras
754 views • 3 months ago
1:14:43
Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality
Gabriel Mongaras
7.6K views • 4 months ago
38:55
CoPE - Contextual Position Encoding: Learning to Count What's Important
Gabriel Mongaras
1.3K views • 4 months ago
45:48
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Gabriel Mongaras
833 views • 4 months ago
43:26
xLSTM: Extended Long Short-Term Memory
Gabriel Mongaras
1.9K views • 5 months ago
37:09
KAN: Kolmogorov-Arnold Networks
Gabriel Mongaras
55K views • 5 months ago
30:07
LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Gabriel Mongaras
912 views • 5 months ago
37:00
Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction
Gabriel Mongaras
1.9K views • 6 months ago
32:49
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Gabriel Mongaras
3.6K views • 6 months ago
40:14
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Gabriel Mongaras
2K views • 6 months ago
4:54
Q* AGI Achieved (Apr Fools)
Gabriel Mongaras
786 views • 6 months ago
1:02:30
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Gabriel Mongaras
4.1K views • 6 months ago
37:08
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Gabriel Mongaras
1K views • 7 months ago
46:25
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet
Gabriel Mongaras
5.5K views • 7 months ago
31:15
DoRA: Weight-Decomposed Low-Rank Adaptation
Gabriel Mongaras
1.9K views • 7 months ago
1:02:38
OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers
Gabriel Mongaras
11K views • 8 months ago
33:55
A Decoder-only Foundation Model For Time-series Forecasting
Gabriel Mongaras
4K views • 8 months ago
37:30
Lumiere: A Space-Time Diffusion Model for Video Generation
Gabriel Mongaras
664 views • 8 months ago
28:56
Exphormer: Sparse Transformers for Graphs
Gabriel Mongaras
438 views • 8 months ago
25:56
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Gabriel Mongaras
1.8K views • 8 months ago
40:23
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Gabriel Mongaras
467 views • 9 months ago
29:38
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Gabriel Mongaras
857 views • 9 months ago
39:02
Translatotron 3: Speech to Speech Translation with Monolingual Data
Gabriel Mongaras
861 views • 9 months ago
44:02
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gabriel Mongaras
9.6K views • 10 months ago
47:32
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Gabriel Mongaras
2K views • 10 months ago
Load More