Skip to main content
Vamsi Cheruku.
Back to Dashboard

Technical Articles.

In-depth conceptual logs documenting my understanding of neural network calculations, self-attention, token borders, and language decoding systems.

Stage 1

Deep Learning Foundations

Linear Algebra and backpropagation math.

May 16, 2026math

How Backpropagation Actually Works: A Geometric Derivation

Demystifying the backpropagation algorithm. Rather than an opaque code routine, backprop is a reverse-mode gradient tracking process using matrix calculus.

backpropagation calculus gradients deep-learning
2 min readRead Reflection
Stage 2

Transformers

Self-Attention mechanics and Q/K/V retrievals.

May 19, 2026attention

Attention Is Just Differentiable Retrieval

Deconstructing self-attention as a soft database lookup. Query, Key, and Value matrices translate directly to standard database retrieval terms.

attention transformers matrix-multiplication retrieval
3 min readRead Reflection
Stage 3

Tokenization

Byte-Pair Encoding (BPE) vocab construction.

May 24, 2026tokenization

Tokenization Is More Important Than Most People Think

Why many common LLM bugs are actually tokenization errors. Deep-diving into BPE merge tables, spacing biases, and byte-level fallback strategies.

tokenization bpe minbpe llm-internals
3 min readRead Reflection
Stage 4

GPT Architecture

Decoder-only models and next-token prediction limits.

May 28, 2026gpt

From Next-Token Prediction to Agents: The Architectural Leap

How does a statistical next-token predictor become an autonomous agent? Analyzing the transition from autoregressive sampling to active execution loops.

agents llm-internals architectures systems
3 min readRead Reflection
May 26, 2026gpt

Building GPT Changed How I Think About LLMs

Why coding a decoder-only transformer from scratch in PyTorch exposes the core engineering truths hidden behind high-level APIs.

gpt transformers pytorch coding
3 min readRead Reflection