Materials
Stanford MLSys Seminar: [Link]
MIT 6.5940 Fall 2024 TinyML and Efficient Deep Learning Computing: [Link]
Subtopic 1: I/O Aware & Exact Attention
Subtopic 2: Sparse Attention
Subtopic 3: Kernel Generation & Compiler
Subtopic 4: Execution Optimization/Serving
Chapter II: Efficient LLM
Subtopic 2: Efficient Inference & Long-context
| Paper | Link |
|---|
| Streaming LLM & DuoAttention | PDF, PDF |
| MInference | PDF |
| H2O | PDF |
| TOVA/KIVI | PDF, PDF |
| Speculative Decoding | PDF, PDF |
| Multi-token prediction: Deepseek-v3 | PDF |
Subtopic 3: Model Compression (Quant & Pruning)
Subtopic 4: Efficient Training
Subtopic 5: Efficient Model Designs
| Paper | Link |
|---|
| Switch Transformers/Outrageously Large Neural Networks | PDF, PDF |
| MLA Attention | PDF |
| Mamba | PDF |
Chapter III: Video Generation
Subtopic 1: SOTA/Baseline Model
Subtopic 2: Optimization Techniques
Subtopic 3: Long Video Generation
| Paper | Link |
|---|
| Tuning-Free Multi-Event Long Video Generation | PDF |
| Long Context Tuning for Video Generation | PDF |
| One-Minute Video Generation with Test-Time Training | PDF |
| SKYREELS-V2 | PDF |
Subtopic 4: Video Super Resolution
| Paper | Link |
|---|
| SeedVR | PDF |
| MGLD-VSR | PDF |
| DynamicScaler | PDF |
Subtopic 1: Diffusion Model/Flow Matching
Subtopic 3: System Optimization Techniques
| Paper | Link |
|---|
| Reed-Solomon | PDF |
| Algebraic Soft-Decision Decoding of Reed–Solomon Codes | PDF |
| ZENO | PDF |
Chapter V: MLLM Video Understanding
Subtopic 1: SOTA/Baseline
Subtopic 2: System Optimization Techniques
| Paper | Link |
|---|
| ATP-LLaVA: Adaptive Token Pruning | PDF |
| FastVID: Dynamic Density Pruning | PDF |
| AdaReTaKe: Token Compression | PDF |
| Cocktail: Mixed-Precision Quantization | PDF |
| FastCache: KV-Cache Compression | PDF |
Subtopic 3: Attention Kernel Optimization
| Paper | Link |
|---|
| AttentionEngine | PDF |
| SpargeAttn | PDF |
| FlexPrefill | PDF |
| MInference 1.0 | PDF |
Subtopic 4: Algorithm Design
| Paper | Link |
|---|
| Adaptive Keyframe Sampling | PDF |
| Re-thinking Temporal Search | PDF |
| Improving LLM Video Understanding with 16 FPS | PDF |