Materials
| Paper | Link |
|---|---|
| FlexAttention | |
| FlexAttention(PyTorch Blog) | Website |
| Mercury | |
| KPerfIR | |
| ForestColl | |
| Gemini | |
| XGrammar |
| Paper | Link |
|---|---|
| FlashAttention 1, 2, 3 | PDF, PDF, PDF |
| PagedAttention (vLLM) | |
| SGLang | |
| FlexAttention | |
| FlashInfer | |
| SpargeAttention | |
| SageAttention 1,2 | PDF, PDF |
| Paper | Link |
|---|---|
| Streaming LLM & DuoAttention | PDF, PDF |
| MInference | |
| H2O | |
| TOVA/KIVI | PDF, PDF |
| Speculative Decoding | PDF, PDF |
| Multi-token prediction: Deepseek-v3 |
| Paper | Link |
|---|---|
| Tuning-Free Multi-Event Long Video Generation | |
| Long Context Tuning for Video Generation | |
| One-Minute Video Generation with Test-Time Training | |
| SKYREELS-V2 |
| Paper | Link |
|---|---|
| ATP-LLaVA: Adaptive Token Pruning | |
| FastVID: Dynamic Density Pruning | |
| AdaReTaKe: Token Compression | |
| Cocktail: Mixed-Precision Quantization | |
| FastCache: KV-Cache Compression |
| Paper | Link |
|---|---|
| Adaptive Keyframe Sampling | |
| Re-thinking Temporal Search | |
| Improving LLM Video Understanding with 16 FPS |
