Engineering My Way
Into Agentic AI.
A public record of my journey from software engineering to AI systems and agentic AI.
Why This Exists
I'm documenting my journey from software engineering toward AI systems and agentic AI. This is not a tutorial website. It is a public record of what I'm studying, what I'm building understanding around, and how my thinking evolves over time.
Studied Resources
Deep-dives & reflections on books, courses, and papers.
Technical Articles
Long-form conceptual logs on systems, calculus, and tensors.
Reference Notes
Mathematical sheets, parameters tables, and architectures.
Learning Roadmap
A linear roadmap detailing my study progression and planned topics.
Deep Learning Foundations
Linear Algebra transformations, Calculus chain-rule, Backpropagation math, Vector-Jacobian Products.
Transformers
Self-Attention mechanics, Q/K/V database retrieval projection, Multi-head dimensional mapping.
Tokenization
Byte-Pair Encoding (BPE) vocabulary construction, spacing biases, and UTF-8 byte fallback strategies.
GPT Architecture
Decoder-only autoregressive models, causal masking, LayerNorm (Pre-LN), and parameter initializations.
Prompt Engineering
System prompting, few-shot conditioning, chain-of-thought orchestration, XML output delimiters, and instruction alignment.
Tool Calling
Structured JSON Schema definitions, active function-calling loops, and execution control.
Model Context Protocol (MCP)
Standardizing tool integrations over stdio/SSE JSON-RPC protocols.
Agent Systems
Autonomous execution loops, ReAct reasoning loops, and multi-agent task fanning.
Memory
Short-term execution state vs long-term semantic vector database retrieval.
Context Engineering
Dynamic context window pruning, token optimization, and semantic context compression.
Agent Infrastructure
Trajectory trace logs, assertion evaluations, and distributed agent execution hosting.
Recent Articles
View all articles →From Next-Token Prediction to Agents: The Architectural Leap
How does a statistical next-token predictor become an autonomous agent? Analyzing the transition from autoregressive sampling to active execution loops.
Building GPT Changed How I Think About LLMs
Why coding a decoder-only transformer from scratch in PyTorch exposes the core engineering truths hidden behind high-level APIs.
Tokenization Is More Important Than Most People Think
Why many common LLM bugs are actually tokenization errors. Deep-diving into BPE merge tables, spacing biases, and byte-level fallback strategies.
Recent Notes
View all notes →LayerNorm Mechanics (Pre-LN vs Post-LN)
Formulas and differences between Pre-LN and Post-LN architectures in stabilizing transformer training.
Decoder-Only Transformer Configurations
Block layout, causal routing, and standard hyperparameters for decoder-only models.
UTF-8 Byte Fallback Strategy
How modern tokenizers handle out-of-vocabulary characters using raw byte encoding.
Byte-Pair Encoding Merge Tables
How Byte-Pair Encoding (BPE) algorithms construct vocabularies and manage token merge tables.