Technical Articles.
In-depth conceptual logs documenting my understanding of neural network calculations, self-attention, token borders, and language decoding systems.
Deep Learning Foundations
Linear Algebra and backpropagation math.
How Backpropagation Actually Works: A Geometric Derivation
Demystifying the backpropagation algorithm. Rather than an opaque code routine, backprop is a reverse-mode gradient tracking process using matrix calculus.
Transformers
Self-Attention mechanics and Q/K/V retrievals.
Attention Is Just Differentiable Retrieval
Deconstructing self-attention as a soft database lookup. Query, Key, and Value matrices translate directly to standard database retrieval terms.
Tokenization
Byte-Pair Encoding (BPE) vocab construction.
Tokenization Is More Important Than Most People Think
Why many common LLM bugs are actually tokenization errors. Deep-diving into BPE merge tables, spacing biases, and byte-level fallback strategies.
GPT Architecture
Decoder-only models and next-token prediction limits.
From Next-Token Prediction to Agents: The Architectural Leap
How does a statistical next-token predictor become an autonomous agent? Analyzing the transition from autoregressive sampling to active execution loops.
Building GPT Changed How I Think About LLMs
Why coding a decoder-only transformer from scratch in PyTorch exposes the core engineering truths hidden behind high-level APIs.