Skip to main content
COMP 620
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Materials

Guest Lecture Materials

PaperLink
FlexAttentionPDF
FlexAttention(PyTorch Blog)Website
MercuryPDF
KPerfIRPDF
ForestCollPDF
GeminiPDF
XGrammarPDF

Subtopic 1: I/O Aware & Exact Attention

PaperLink
FlashAttention 1, 2, 3PDF, PDF, PDF
PagedAttention (vLLM)PDF
SGLangPDF
FlexAttentionPDF
FlashInferPDF
SpargeAttentionPDF
SageAttention 1,2PDF, PDF

Subtopic 2: Sparse Attention

PaperLink
DejaVuPDF
H2OPDF
SpAttnPDF
MoEPDF
Deepspeed-MoEPDF

Subtopic 3: Kernel Generation & Compiler

PaperLink
TVMPDF
AnsorPDF
MLIRPDF

Subtopic 4: Execution Optimization/Serving

PaperLink
AlpaPDF
OrcaPDF
FlexGenPDF
ZeRO-OffloadingPDF
Megatron-LMPDF
FlashDecoding++PDF
SarathiServePDF

Chapter II: Efficient LLM

Subtopic 1: LLM 101

PaperLink
Attention is All You NeedPDF
BERTPDF
GPT-3PDF
Scaling LawsPDF
RLHFPDF
PPO/DPOPDF , PDF

Subtopic 2: Efficient Inference & Long-context

PaperLink
Streaming LLM & DuoAttentionPDF, PDF
MInferencePDF
H2OPDF
TOVA/KIVIPDF, PDF
Speculative DecodingPDF, PDF
Multi-token prediction: Deepseek-v3PDF

Subtopic 3: Model Compression (Quant & Pruning)

PaperLink
LLM.int8()/GPTQPDF, PDF
AWQPDF
LLM PrunerPDF
ShearedLlamaPDF

Subtopic 4: Efficient Training

PaperLink
ZeROPDF
Megatron-LMPDF
LoRA & QLoRAPDF, PDF

Subtopic 5: Efficient Model Designs

PaperLink
Switch Transformers/Outrageously Large Neural NetworksPDF, PDF
MLA AttentionPDF
MambaPDF

Chapter III: Video Generation

Subtopic 1: SOTA/Baseline Model

PaperLink
CogVideoXPDF
HunyuanVideoPDF
WANPDF
Seaweed-7BPDF

Subtopic 2: Optimization Techniques

PaperLink
PruningPDF
CachePDF
CompressionPDF
SparsityPDF

Subtopic 3: Long Video Generation

PaperLink
Tuning-Free Multi-Event Long Video GenerationPDF
Long Context Tuning for Video GenerationPDF
One-Minute Video Generation with Test-Time TrainingPDF
SKYREELS-V2PDF

Subtopic 4: Video Super Resolution

PaperLink
SeedVRPDF
MGLD-VSRPDF
DynamicScalerPDF

Chapter IV: Secure LLM

Subtopic 1: Diffusion Model/Flow Matching

PaperLink
DDIMPDF
DDPMPDF
Score-basedPDF
Flow MatchingPDF

Subtopic 2: Watermarking

PaperLink
HiDDeNPDF
Stable SignaturePDF
WatermarkDMPDF
AquaLoRAPDF
Tree-RingPDF

Subtopic 3: System Optimization Techniques

PaperLink
ByteSchedulerPDF
PipeDreamPDF
APNN-TCPDF
QGTCPDF

Subtopic 4: Encryption

PaperLink
Reed-SolomonPDF
Algebraic Soft-Decision Decoding of Reed–Solomon CodesPDF
ZENOPDF

Chapter V: MLLM Video Understanding

Subtopic 1: SOTA/Baseline

PaperLink
Qwen2.5-VLPDF
StormPDF
LLaVAPDF
LLaMAPDF
Seed1.5 VLPDF
Kwai Keye-VLPDF

Subtopic 2: System Optimization Techniques

PaperLink
ATP-LLaVA: Adaptive Token PruningPDF
FastVID: Dynamic Density PruningPDF
AdaReTaKe: Token CompressionPDF
Cocktail: Mixed-Precision QuantizationPDF
FastCache: KV-Cache CompressionPDF

Subtopic 3: Attention Kernel Optimization

PaperLink
AttentionEnginePDF
SpargeAttnPDF
FlexPrefillPDF
MInference 1.0PDF

Subtopic 4: Algorithm Design

PaperLink
Adaptive Keyframe SamplingPDF
Re-thinking Temporal SearchPDF
Improving LLM Video Understanding with 16 FPSPDF