Reverse-Mode Auto-Diff (Backprop Graph Math) (math)

Backpropagation is an implementation of reverse-mode automatic differentiation. Instead of deriving symbolic formulas, we evaluate derivatives numerically by traversing a computational graph in reverse order.

Computational Graph Example

Consider the function:

f(x, w, b) = o(w * x + b)

We can break this down into elementary operations (nodes in a graph):

u = w * x
v = u + b
y = o(v)

w ───┐
     ├─► [u] ───► [+] ───► [o] ───► y
x ───┘             ▲
                   │
b ─────────────────┘

Forward Pass

Compute values moving left-to-right through the nodes:

Input: x = 2.0, w = 0.5, b = -1.0
u = 0.5 * 2.0 = 1.0
v = 1.0 + (-1.0) = 0.0
y = Sigmoid(0.0) = 0.5

Backward Pass (Adjoint calculations)

Let the final loss derivative with respect to output y be dL/dy = 1.0. We compute the adjoint values (local gradients) moving right-to-left:

At node y = Sigmoid(v):
- Local derivative: dy/dv = y * (1 - y) = 0.5 * (1 - 0.5) = 0.25
- Accumulate gradient: dL/dv = dL/dy * dy/dv = 1.0 * 0.25 = 0.25
At node v = u + b:
- Local derivatives: dv/du = 1, dv/db = 1
- Accumulate:
  - dL/du = dL/dv * dv/du = 0.25 * 1 = 0.25
  - dL/db = dL/dv * dv/db = 0.25 * 1 = 0.25
At node u = w * x:
- Local derivatives: du/dw = x, du/dx = w
- Accumulate:
  - dL/dw = dL/du * du/dw = 0.25 * 2.0 = 0.5
  - dL/dx = dL/du * du/dx = 0.25 * 0.5 = 0.125

Graph Rule

If a node z branches to multiple paths (e.g. z -> a and z -> b), we sum the incoming gradients during the backward pass:

dL/dz = dL/da * da/dz + dL/db * db/dz

Computational Graph Example

Forward Pass

Backward Pass (Adjoint calculations)

Graph Rule

Share Reference Sheet