Skip to main content

Materials

Links

Stanford MLSys Seminar: [Link]

MIT 6.5940 Fall 2024 TinyML and Efficient Deep Learning Computing: [Link]

Subtopic 1: I/O Aware & Exact Attention

Paper	Link
FlashAttention 1, 2, 3	PDF, PDF, PDF
PagedAttention (vLLM)	PDF
SGLang	PDF
FlexAttention	PDF
FlashInfer	PDF
SpargeAttention	PDF
SageAttention 1,2	PDF, PDF

Subtopic 2: Sparse Attention

Paper	Link
DejaVu	PDF
H2O	PDF
SpAttn	PDF
MoE	PDF
Deepspeed-MoE	PDF

Subtopic 3: Kernel Generation & Compiler

Paper	Link
TVM	PDF
Ansor	PDF
MLIR	PDF

Subtopic 4: Execution Optimization/Serving

Paper	Link
Alpa	PDF
Orca	PDF
FlexGen	PDF
ZeRO-Offloading	PDF
Megatron-LM	PDF
FlashDecoding++	PDF
SarathiServe	PDF

Chapter II: Efficient LLM

Subtopic 1: LLM 101

Paper	Link
Attention is All You Need	PDF
BERT	PDF
GPT-3	PDF
Scaling Laws	PDF
RLHF	PDF
PPO/DPO	PDF , PDF

Subtopic 2: Efficient Inference & Long-context

Paper	Link
Streaming LLM & DuoAttention	PDF, PDF
MInference	PDF
H2O	PDF
TOVA/KIVI	PDF, PDF
Speculative Decoding	PDF, PDF
Multi-token prediction: Deepseek-v3	PDF

Subtopic 3: Model Compression (Quant & Pruning)

Paper	Link
LLM.int8()/GPTQ	PDF, PDF
AWQ	PDF
LLM Pruner	PDF
ShearedLlama	PDF

Subtopic 4: Efficient Training

Paper	Link
ZeRO	PDF
Megatron-LM	PDF
LoRA & QLoRA	PDF, PDF

Subtopic 5: Efficient Model Designs

Paper	Link
Switch Transformers/Outrageously Large Neural Networks	PDF, PDF
MLA Attention	PDF
Mamba	PDF

Chapter III: Video Generation

Subtopic 1: SOTA/Baseline Model

Paper	Link
CogVideoX	PDF
HunyuanVideo	PDF
WAN	PDF
Seaweed-7B	PDF

Subtopic 2: Optimization Techniques

Paper	Link
Pruning	PDF
Cache	PDF
Compression	PDF
Sparsity	PDF

Subtopic 3: Long Video Generation

Paper	Link
Tuning-Free Multi-Event Long Video Generation	PDF
Long Context Tuning for Video Generation	PDF
One-Minute Video Generation with Test-Time Training	PDF
SKYREELS-V2	PDF

Subtopic 4: Video Super Resolution

Paper	Link
SeedVR	PDF
MGLD-VSR	PDF
DynamicScaler	PDF

Chapter IV: Secure LLM

Subtopic 1: Diffusion Model/Flow Matching

Paper	Link
DDIM	PDF
DDPM	PDF
Score-based	PDF
Flow Matching	PDF

Subtopic 2: Watermarking

Paper	Link
HiDDeN	PDF
Stable Signature	PDF
WatermarkDM	PDF
AquaLoRA	PDF
Tree-Ring	PDF

Subtopic 3: System Optimization Techniques

Paper	Link
ByteScheduler	PDF
PipeDream	PDF
APNN-TC	PDF
QGTC	PDF

Subtopic 4: Encryption

Paper	Link
Reed-Solomon	PDF
Algebraic Soft-Decision Decoding of Reed–Solomon Codes	PDF
ZENO	PDF

Chapter V: MLLM Video Understanding

Subtopic 1: SOTA/Baseline

Paper	Link
Qwen2.5-VL	PDF
Storm	PDF
LLaVA	PDF
LLaMA	PDF
Seed1.5 VL	PDF
Kwai Keye-VL	PDF

Subtopic 2: System Optimization Techniques

Paper	Link
ATP-LLaVA: Adaptive Token Pruning	PDF
FastVID: Dynamic Density Pruning	PDF
AdaReTaKe: Token Compression	PDF
Cocktail: Mixed-Precision Quantization	PDF
FastCache: KV-Cache Compression	PDF

Subtopic 3: Attention Kernel Optimization

Paper	Link
AttentionEngine	PDF
SpargeAttn	PDF
FlexPrefill	PDF
MInference 1.0	PDF

Subtopic 4: Algorithm Design

Paper	Link
Adaptive Keyframe Sampling	PDF
Re-thinking Temporal Search	PDF
Improving LLM Video Understanding with 16 FPS	PDF