Skip to main content
Vamsi Cheruku.
Back to Notes
attention2026-05-18

Query, Key, and Value Projection Dimensions

Dimensional matrix shapes, projections, and dot-product calculations in self-attention layers.

attention dimensions math

Self-Attention transforms input sequences via three matrix projections. Here is the mathematical summary of query, key, and value transformations.

Projection Equations

Given input sequence matrix X of shape [Batch, SequenceLength, d_model], we project it using three weight matrices:

Q = X * W_q
K = X * W_k
V = X * W_v

Parameter Shapes and Bounds

For a single attention head:

Matrix / TensorSymbolTypical ShapeDescription
InputX[B, T, C]B = batch size, T = sequence length, C = d_model
Query WeightsW_q[C, d_k]Projector matrix for queries
Key WeightsW_k[C, d_k]Projector matrix for keys
Value WeightsW_v[C, d_v]Projector matrix for values
QueriesQ[B, T, d_k]Output query vectors
KeysK[B, T, d_k]Output key vectors
ValuesV[B, T, d_v]Output value vectors

Note: Usually, d_k = d_v = d_model / num_heads.

Dot-Product Math

The attention weights matrix A is calculated as:

A = Softmax( (Q * K^T) / sqrt(d_k) )

Where:

  • Q * K^T is the matrix multiplication of shapes [B, T, d_k] and [B, d_k, T], resulting in a shape of [B, T, T].
  • We apply Softmax over the last dimension (the columns).
  • The final context output matrix O = A * V has shape [B, T, d_v].

Share Reference Sheet