Benchmarks

llm-usage-metrics is engineered for extreme throughput. In production environments, it consistently outperforms legacy tools by an order of magnitude, enabling real-time cost analysis even on massive datasets.

Performance Summary

Codex-only parity run: 4.60x faster cold and 22.78x faster cached.
Multi-source OpenAI run: 3.63x faster cold and 17.75x faster cached.
Sub-second cached reporting in both scenarios (0.746s codex-only, 0.941s multi-source).

Production benchmark (February 27, 2026)

This benchmark compares:

ccusage-codex monthly (Codex-only baseline)
llm-usage monthly --provider openai --source codex (direct source-to-source parity)
llm-usage monthly --provider openai --source pi,codex,gemini,opencode (multi-source OpenAI scope)

Both commands were executed on the same machine, against real local production data, with repeated timed runs.

You can reproduce the direct source-to-source benchmark with:

pnpm run perf:production-benchmark -- \
  --runs 5 \
  --llm-source codex

You can reproduce the multi-source OpenAI benchmark with:

pnpm run perf:production-benchmark -- \
  --runs 5 \
  --llm-source pi,codex,gemini,opencode

To export reusable artifacts:

pnpm run perf:production-benchmark -- \
  --runs 5 \
  --llm-source codex \
  --json-output ./tmp/production-benchmark-openai-codex.json \
  --markdown-output ./tmp/production-benchmark-openai-codex.md

pnpm run perf:production-benchmark -- \
  --runs 5 \
  --llm-source pi,codex,gemini,opencode \
  --json-output ./tmp/production-benchmark-openai-multi-source.json \
  --markdown-output ./tmp/production-benchmark-openai-multi-source.md

Baseline machine

Spec	Value
OS	CachyOS (Linux 6.19.2-2-cachyos)
CPU	Intel Core Ultra 9 185H (22 logical CPUs, up to 5.10 GHz)
Memory	62 GiB RAM + 62 GiB swap
Storage	NVMe SSD (KXG8AZNV1T02 LA KIOXIA, 953.9 GiB)
Node.js	v24.12.0
pnpm	10.17.1
`ccusage-codex`	18.0.8
`llm-usage` (`llm-usage-metrics`)	0.3.4

Cache modes used

no cache
- fresh XDG_CACHE_HOME for each run
- ccusage-codex: --no-offline
- llm-usage: LLM_USAGE_PARSE_CACHE_ENABLED=0 and no --pricing-offline
with cache
- dedicated warm cache directory
- ccusage-codex: --offline
- llm-usage: --pricing-offline with warmed parse cache

For repeatability, LLM_USAGE_SKIP_UPDATE_CHECK=1 was set for llm-usage benchmark runs.

Commands benchmarked

# ccusage-codex
ccusage-codex monthly --json
ccusage-codex monthly --offline --json

# llm-usage-metrics direct parity (codex only)
llm-usage monthly --provider openai --source codex --json
llm-usage monthly --provider openai --source codex --pricing-offline --json

# llm-usage-metrics multi-source (openai provider)
llm-usage monthly --provider openai --source pi,codex,gemini,opencode --json
llm-usage monthly --provider openai --source pi,codex,gemini,opencode --pricing-offline --json

Runtime results (5 runs each): direct source-to-source (`--source codex`)

Tool	Cache mode	Median (s)	Mean (s)	Min (s)	Max (s)
`ccusage-codex monthly`	no cache	16.785	17.288	16.35	19.363
`ccusage-codex monthly --offline`	with cache	16.995	17.594	16.462	19.909
`llm-usage monthly --provider openai --source codex`	no cache	3.651	3.76	3.526	4.411
`llm-usage monthly --provider openai --source codex --pricing-offline`	with cache	0.746	0.724	0.618	0.81

Derived from median runtime:

llm-usage vs ccusage-codex (no cache): 4.60x faster
llm-usage vs ccusage-codex (with cache): 22.78x faster
llm-usage cache effect: 4.89x faster with cache
ccusage-codex cache effect: 0.99x faster with cache

Runtime results (5 runs each): multi-source OpenAI (`--source pi,codex,gemini,opencode`)

Tool	Cache mode	Median (s)	Mean (s)	Min (s)	Max (s)
`ccusage-codex monthly`	no cache	17.297	17.463	16.76	18.634
`ccusage-codex monthly --offline`	with cache	16.698	16.745	16.204	17.17
`llm-usage monthly --provider openai --source pi,codex,gemini,opencode`	no cache	4.767	4.864	4.544	5.229
`llm-usage monthly --provider openai --source pi,codex,gemini,opencode --pricing-offline`	with cache	0.941	0.951	0.912	1.004

Derived from median runtime:

llm-usage vs ccusage-codex (no cache): 3.63x faster
llm-usage vs ccusage-codex (with cache): 17.75x faster
llm-usage cache effect: 5.07x faster with cache
ccusage-codex cache effect: 1.04x faster with cache

Dataset scope observed during benchmark

These commands do not cover identical scope, so compare runtime with that context.

Tool	Scope snapshot from benchmark run
`llm-usage monthly --provider openai --source codex`	Direct codex-only scope parity with `ccusage-codex monthly`
`llm-usage monthly --provider openai --source pi,codex,gemini,opencode`	Multi-source OpenAI scope across four adapters
`ccusage-codex monthly`	Codex-only report (`monthly` array plus `totals`)

Interpretation

llm-usage remains substantially faster in both parity and multi-source comparisons.
llm-usage benefits strongly from parse + pricing cache in repeated runs.
ccusage-codex runtime remains similar between --no-offline and --offline on this dataset.
Results are production-real for this machine and data, not universal constants. Re-run on your own workload before making broad conclusions.