Channel - Gabriel Mongaras- Z-Tube

Round and Round We Go! What makes Rotary Positional Encodings useful?

Gabriel Mongaras
397 views • 4 days ago

Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt

Gabriel Mongaras
1.1K views • 2 months ago

Attending to Topological Spaces: The Cellular Transformer

Gabriel Mongaras
667 views • 3 months ago

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Gabriel Mongaras
2.5K views • 3 months ago

WARP: On the Benefits of Weight Averaged Rewarded Policies

Gabriel Mongaras
737 views • 3 months ago

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Gabriel Mongaras
754 views • 3 months ago

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Gabriel Mongaras
7.6K views • 4 months ago

CoPE - Contextual Position Encoding: Learning to Count What's Important

Gabriel Mongaras
1.3K views • 4 months ago

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Gabriel Mongaras
833 views • 4 months ago

xLSTM: Extended Long Short-Term Memory

Gabriel Mongaras
1.9K views • 5 months ago

KAN: Kolmogorov-Arnold Networks

Gabriel Mongaras
55K views • 5 months ago

LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Gabriel Mongaras
912 views • 5 months ago

Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction

Gabriel Mongaras
1.9K views • 6 months ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Gabriel Mongaras
3.6K views • 6 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Gabriel Mongaras
2K views • 6 months ago

Q* AGI Achieved (Apr Fools)

Gabriel Mongaras
786 views • 6 months ago

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Gabriel Mongaras
4.1K views • 6 months ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Gabriel Mongaras
1K views • 7 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

Gabriel Mongaras
5.5K views • 7 months ago

DoRA: Weight-Decomposed Low-Rank Adaptation

Gabriel Mongaras
1.9K views • 7 months ago

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers

Gabriel Mongaras
11K views • 8 months ago

A Decoder-only Foundation Model For Time-series Forecasting

Gabriel Mongaras
4K views • 8 months ago

Lumiere: A Space-Time Diffusion Model for Video Generation

Gabriel Mongaras
664 views • 8 months ago

Exphormer: Sparse Transformers for Graphs

Gabriel Mongaras
438 views • 8 months ago

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Gabriel Mongaras
1.8K views • 8 months ago

Boundary Attention: Learning to Find Faint Boundaries at Any Resolution

Gabriel Mongaras
467 views • 9 months ago

Cached Transformers: Improving Transformers with Differentiable Memory Cache

Gabriel Mongaras
857 views • 9 months ago

Translatotron 3: Speech to Speech Translation with Monolingual Data

Gabriel Mongaras
861 views • 9 months ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gabriel Mongaras
9.6K views • 10 months ago

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Gabriel Mongaras
2K views • 10 months ago

Load More