A visual guide to TTFW, streaming WPS, end-to-end WPS, and how they're measured.
The request lifecycle
wordCount / streamingTimeAll six metrics at a glance
vs. standard API metrics
The standard LLM performance vocabulary — Time to First Token (TTFT), Inter-Token Latency (ITL), and Tokens Per Second (TPS) — measures throughput at the infrastructure boundary: the API endpoint, the model server, the network edge. The metrics on this page measure at the rendering layer inside the browser, capturing what the person in front of the screen actually perceives. These are Real User Metrics (RUM), not infrastructure metrics.
| API / infra metric | Real-user equivalent | What the gap captures |
|---|---|---|
|
TTFT
Time to First Token
Measured at the HTTP stream
|
TTFW
Time to First Word
Measured when text appears on screen
|
Chunk buffering, markdown parsing, React/DOM render cycles, and browser paint frames. A token arriving at the network layer is not the same as a word appearing in the UI. TTFW is always ≥ TTFT. |
|
ITL
Inter-Token Latency
Avg. gap between successive tokens
|
longestStallMs
Worst-case freeze during streaming
Measured between rendered word chunks
|
ITL is an average; it masks the single long freeze that breaks perceived smoothness. A 20 ms average ITL with one 800 ms spike feels broken — but ITL alone won't tell you that. longestStallMs surfaces exactly that worst case. |
|
TPS
Tokens Per Second
Model throughput at the API
|
wordsPerSecond
Visible words per second (streaming only)
Measured in the rendering layer
|
Tokens and words are not the same unit. On average ~1.3 tokens make one word, but code, URLs, and non-English text skew this heavily. WPS counts what the user reads. TPS also doesn't account for frontend buffering — the UI may batch multiple tokens into a single DOM update, making the visible rate lower than the raw token rate. |
|
TPS (E2E variant)
Sometimes: total tokens / total time
Blended rate including TTFT wait
|
endToEndWordsPerSecond
Words per second of total wait
Penalises TTFW latency explicitly
|
Same intent, but measured at the UI layer in words not tokens, making it directly interpretable by product teams without needing to know a model's tokenizer. The word unit also scales intuitively with response length as the user experiences it. |
The key distinction
Interactive example
Move TTFW or word count and watch the derived metrics update instantly.
What drives each metric