Skip to main content
Vamsi Cheruku.
Learning Journey

Engineering My Way
Into Agentic AI.

A public record of my journey from software engineering to AI systems and agentic AI.

Why This Exists

I'm documenting my journey from software engineering toward AI systems and agentic AI. This is not a tutorial website. It is a public record of what I'm studying, what I'm building understanding around, and how my thinking evolves over time.

Learning Roadmap

A linear roadmap detailing my study progression and planned topics.

Stage 1Studied

Deep Learning Foundations

Linear Algebra transformations, Calculus chain-rule, Backpropagation math, Vector-Jacobian Products.

Stage 2Studied

Transformers

Self-Attention mechanics, Q/K/V database retrieval projection, Multi-head dimensional mapping.

Stage 3Studied

Tokenization

Byte-Pair Encoding (BPE) vocabulary construction, spacing biases, and UTF-8 byte fallback strategies.

Stage 4Exploring

GPT Architecture

Decoder-only autoregressive models, causal masking, LayerNorm (Pre-LN), and parameter initializations.

Stage 5Upcoming

Prompt Engineering

System prompting, few-shot conditioning, chain-of-thought orchestration, XML output delimiters, and instruction alignment.

Stage 6Future

Tool Calling

Structured JSON Schema definitions, active function-calling loops, and execution control.

Stage 7Future

Model Context Protocol (MCP)

Standardizing tool integrations over stdio/SSE JSON-RPC protocols.

Stage 8Future

Agent Systems

Autonomous execution loops, ReAct reasoning loops, and multi-agent task fanning.

Stage 9Future

Memory

Short-term execution state vs long-term semantic vector database retrieval.

Stage 10Future

Context Engineering

Dynamic context window pruning, token optimization, and semantic context compression.

Stage 11Future

Agent Infrastructure

Trajectory trace logs, assertion evaluations, and distributed agent execution hosting.

Recent Articles

View all articles →
May 28, 2026gpt

From Next-Token Prediction to Agents: The Architectural Leap

How does a statistical next-token predictor become an autonomous agent? Analyzing the transition from autoregressive sampling to active execution loops.

May 26, 2026gpt

Building GPT Changed How I Think About LLMs

Why coding a decoder-only transformer from scratch in PyTorch exposes the core engineering truths hidden behind high-level APIs.

May 24, 2026tokenization

Tokenization Is More Important Than Most People Think

Why many common LLM bugs are actually tokenization errors. Deep-diving into BPE merge tables, spacing biases, and byte-level fallback strategies.

Recent Notes

View all notes →
2026-05-26gpt

LayerNorm Mechanics (Pre-LN vs Post-LN)

Formulas and differences between Pre-LN and Post-LN architectures in stabilizing transformer training.

2026-05-25gpt

Decoder-Only Transformer Configurations

Block layout, causal routing, and standard hyperparameters for decoder-only models.

2026-05-23tokenization

UTF-8 Byte Fallback Strategy

How modern tokenizers handle out-of-vocabulary characters using raw byte encoding.

2026-05-22tokenization

Byte-Pair Encoding Merge Tables

How Byte-Pair Encoding (BPE) algorithms construct vocabularies and manage token merge tables.