Materials
Paper | Link |
---|---|
FlexAttention | |
FlexAttention(PyTorch Blog) | Website |
Mercury | |
KPerfIR | |
ForestColl | |
Gemini | |
XGrammar |
Paper | Link |
---|---|
FlashAttention 1, 2, 3 | PDF, PDF, PDF |
PagedAttention (vLLM) | |
SGLang | |
FlexAttention | |
FlashInfer | |
SpargeAttention | |
SageAttention 1,2 | PDF, PDF |
Paper | Link |
---|---|
Streaming LLM & DuoAttention | PDF, PDF |
MInference | |
H2O | |
TOVA/KIVI | PDF, PDF |
Speculative Decoding | PDF, PDF |
Multi-token prediction: Deepseek-v3 |
Paper | Link |
---|---|
Tuning-Free Multi-Event Long Video Generation | |
Long Context Tuning for Video Generation | |
One-Minute Video Generation with Test-Time Training | |
SKYREELS-V2 |
Paper | Link |
---|---|
ATP-LLaVA: Adaptive Token Pruning | |
FastVID: Dynamic Density Pruning | |
AdaReTaKe: Token Compression | |
Cocktail: Mixed-Precision Quantization | |
FastCache: KV-Cache Compression |
Paper | Link |
---|---|
Adaptive Keyframe Sampling | |
Re-thinking Temporal Search | |
Improving LLM Video Understanding with 16 FPS |