Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Gabriel Mongaras Gabriel Mongaras
8.89K subscribers
1,800 views
52

 Published On Jan 23, 2024

Paper here: https://arxiv.org/abs/2401.10774
demo: https://sites.google.com/view/medusa-llm

Notes: https://drive.google.com/file/d/1eOmi...

show more

Share/Embed