Resources & Exploration
Reflective notes on completed study resources, integrated with codebases I am exploring in systems-level AI.
Timeline Progression
Study reviews and codebase exploration grouped by roadmap stages.
Deep Learning Foundations
Linear Algebra, calculus chain-rule, and backpropagation math.
3Blue1Brown Neural Networks & Deep Learning Series
A visual mathematical decomposition of multi-layer networks, parameters optimization, and gradient descent trajectories.
Revisiting Sanderson's series reframed my understanding from programming APIs to geometric tensor warps. It helped me visualize gradient descent not just as a tuning step, but as a path through high-dimensional cost surfaces.
Transformers
Self-Attention mechanics, query/key/value dot products, and multi-head splits.
Attention Is All You Need (Original Paper)
The seminal 2017 paper by Vaswani et al. replacing recurrence (RNNs/LSTMs) entirely with self-attention.
Reading the original paper forced me to look at the raw tensor dimension calculations. Forcing a model to be non-recurrent solves training time but introduces the context memory walls we struggle to optimize in production inference.
Jay Alammar – The Illustrated Transformer
A visual structural analysis of the transformer block, detailing embedding mappings and decoder/encoder projections.
Alammar's diagrams clarified the dimensional transformations that happen within the Multi-Head Attention layer. It helped me visualize how head splits allow the model to attend to different representation subspaces concurrently.
Tokenization
Byte-Pair Encoding (BPE) vocab construction and byte fallback strategies.
Karpathy/minbpe
UpcomingPurposeA clean, educational Python library implementing BPE tokenization.
Why I'm studying itTo study the exact token-merge iterations, regex splitting, and byte-level fallback configurations.
GPT Architecture
Decoder-only models, causal masking, and LayerNorm placement.
Andrej Karpathy – Let's Build GPT from Scratch
A practical, line-by-line coding implementation of a character-level decoder-only transformer in PyTorch.
Coding along with Karpathy bridged the gap between equations and floating-point computations. Correcting loss calculation details showed me how model errors remain hidden from standard compiler checks.
Karpathy/nanoGPT
ExploringPurposeThe cleanest, fastest repository for training medium-scale decoder-only transformers.
Why I'm studying itTo study the training loop structure, learning rate cosine schedules, and multi-GPU DDP abstractions.
Karpathy/llm.c
FuturePurposeLLM training written in raw C/CUDA without heavy PyTorch dependencies.
Why I'm studying itTo understand the low-level systems engineering of backprop math running directly on GPU registers.
ggml-org/llama.cpp
FuturePurposeInference engine for LLaMA models in pure C/C++.
Why I'm studying itTo study quantizations (INT8/INT4), KV caching optimizations, and thread pool orchestration on local processors.
Model Context Protocol (MCP)
Standardizing tool integrations over stdio/SSE JSON-RPC protocols.
modelcontextprotocol/servers
FuturePurposeExemplar MCP servers implementing tools and resources.
Why I'm studying itTo explore stdio/SSE transport boundary checks and tool exposure designs.