How we build the numbers, where they come from, and how confident we are.
Projections are directional arithmetic, not forecasts. Benchmarks are dated and sourced. Each derived metric defaults to a documented method; every assumption is editable. Where a number appears on a Ledger page, that page is the canonical source — this page describes how those numbers are built, not what they are at any given moment.
The AI Ledger tracks the AI industry through three interlocking ledgers. The Capital Ledger follows how much is being spent to build AI infrastructure — semiconductors, data centres, networking. The Revenue Ledger tracks what comes back — who earns what from selling AI products and services. The Usage Ledger estimates actual consumption — tokens processed, models served, inference volume. Together they answer: how much is going in, how much is coming out, and how much is actually being used?
Anchored from NVIDIA Data Centre revenue (Tier 1A), which sets the silicon floor. Cross-checked against MSFT, GOOG, META, and AMZN 10-K capital expenditure filings. Silicon represents approximately 55% of total CapEx; the remainder covers power, construction, and networking. Current total system CapEx is published on the Capital Ledger.
Entity-by-entity rollup sourced from earnings calls, SEC filings, and investor presentations. Revenue uses collected-revenue methodology — never run-rated or exit-ARR. Consumer vs. enterprise split informed by The Information reporting and company segment disclosures. Each entity carries its own provenance tier; current quarterly aggregate is on the Revenue Ledger.
Hyperscaler AI cloud + neocloud + Oracle OCI revenue, published both gross-disclosed and net of model-lab pass-through. Each provider's revenue-recognition policy is read from the most recent 10-K to confirm principal (gross) vs. agent (net) treatment under ASC 606 before publication. Direct frontier-API revenue (OpenAI / Anthropic API direct) stays on the Revenue Ledger; only the cloud-margin leg is in Compute. Current quarterly aggregate is on the Compute Ledger.
Multi-model consensus methodology tracking 14 providers. Primary anchors: OpenRouter public throughput statistics and NVIDIA inference throughput disclosures. China accounts for approximately 50% of global volume. No single provider discloses full usage data, so all figures are Med or Low confidence. Current daily-volume range is published on the Usage Ledger.
The Compute Ledger decomposes hyperscaler + neocloud AI revenue into three segments so the page can publish the right number for the right question. Headline aggregates are post-Copilot — Copilot-class revenue is per-seat productivity SaaS, scoped to a future Apps Ledger.
Pass-through rule. Pass-through is the lab revenue-share applied to Hosted model APIs, not to total AI revenue. An earlier version of the page applied the pass-through percentage to gross AI revenue and reported ~$11.5B 2025; that figure was overstated by roughly 10× because most of hyperscaler AI revenue is direct compute spend (Frontier lab + AI workload), not token-API resale. Corrected pass-through is ~$1B 2025; corrected net is ~$44.5B on a sum-of-quarterlies, post-Copilot basis.
Copilot scope-out rule. M365 Copilot, GitHub Copilot, Copilot Studio (MSFT); Gemini-in-Workspace SKUs (GOOGL); equivalent embedded-AI-feature productivity lines elsewhere — these are per-seat SaaS, not Compute. They are deducted from each provider's disclosed AI line and tracked separately for transparency. They flow to a future Apps Ledger when that is built; until then they are tracked but not published on the Compute page. AWS does not have a named productivity Copilot line, so the default deduction is zero unless a specific embedded-AI line is identifiable in the 10-Q. The scope-out uses each issuer's own declared exclusion values so the deduction matches what is in the reported AI run-rate.
Principal/agent verification gate. Before publication, the revenue-recognition policy notes from the most recent 10-K are read for AWS, MSFT (Azure), GOOGL (Cloud), and ORCL (OCI). For each, principal (gross) vs. agent (net) treatment under ASC 606 is confirmed. All four reviewed entities use principal treatment for AI reseller arrangements (verified 2026-05-06). Treatment is re-verified each cycle as 10-Ks are refiled.
Direct frontier-API revenue is OUT of Compute. When Bedrock resells Claude, AWS keeps the cloud margin (Compute, Hosted model APIs net); Anthropic gets the rest (Model revenue). When a customer calls Anthropic directly, 100% is Model revenue with the underlying compute showing on AWS / GCP P&Ls as Frontier lab compute. This avoids ecosystem-level double-counting between Compute and Apps / Model Revenue.
An earlier version of the page sized each provider's AI line using editorial AI-share weights (e.g. AWS AI = 15% of AWS revenue). That approach has been replaced with bottom-up segment sizing — building each provider's number from disclosed sub-segments rather than top-down ratios. Approach by provider:
The Anthropic equity gain on Alphabet's Q1 2026 P&L ($36.9B) is investment income, not Cloud operating revenue (per Fortune, 30 April 2026) — and does not inflate the GCP AI line. The CoreWeave figure includes ~$3.8B of Microsoft-sub-rented capacity that is also recognised as AWS / Azure compute revenue elsewhere; under each entity's revenue-recognition policy both legs are real revenue lines on each company's P&L, so the ecosystem total is gross of that overlap by definition. We surface it separately rather than try to net it out at the Compute layer.
For each provider, the 2025 calendar AI line is published on a sum-of-quarterlies basis — the four reported (or derived) 2025 quarterly AI revenues summed. We chose this over the alternative annualised run-rate basis (year-end exit run-rate × 4), even where the run-rate number is the figure CEOs prefer to disclose. Why: in a year of accelerating growth, sum-of-Q is materially smaller than annualised — Microsoft's 2025 calendar AI is $25.25B sum-of-Q vs. $28B annualised exit (~+10% gap); Amazon and Google sum-of-Q ≈ annualised because their growth shape doesn't diverge as much. If the trajectory chart anchored Q1 2026 at $9.25B (= disclosed $37B run-rate ÷ 4) but forced the four 2025 quarters to sum to $28B, Q4 2025 would have to balloon above $9.25B — producing a visible quarter-on-quarter drop into Q1 2026 that contradicts the +123% year-on-year growth narrative.
Sum-of-quarterlies basis lets every major provider's quarterly trajectory grow monotonically into Q1 2026 without distortion. The annualised run-rate value is preserved separately as context for each provider — we do not lose it. The Layer Stack and the trajectory chart both use sum-of-quarterlies.
The Layer Stack visual on the Compute Ledger publishes cohort Apps Revenue — the customer-paid revenue earned by the named entities the Revenue Ledger tracks, ~$17.36B as of 2026-05-06 — not an editorially-extended ecosystem total. An earlier version of the page applied a 5× multiplier to bridge the cohort total ($17.4B) to a notional enterprise SaaS AI ARR figure (~$100B); that multiplier had no provenance trail and the ratio swung ±25% on a single editorial constant. The corrected page shows the cohort number as-is and flags the gap between cohort and ecosystem as scope for a future Apps Ledger. All four Layer Stack layers are on lookback 2025 actuals on the same time basis (sum-of-quarterlies for compute, fiscal-year 2026 calendar-2025 for silicon) so the multipliers between layers are apples-to-apples.
The Revenue Ledger Sankey routes per-entity revenue across channels (Model Subs / Model API / Hyperscalers / AI Native Apps / Trad. SaaS) and into buyer segments (Consumer / AI Natives / Enterprises & Govs / VC-Investors). An earlier version of the model applied a flat 70/30 Model API / Hyperscalers split to every provider's API revenue. That weighting is roughly right for enterprise customers — who buy Bedrock, Azure OpenAI or Vertex for compliance and procurement reasons — but wrong for AI Natives. Cursor, Glean, Perplexity, Harvey and similar buyers go ~95% direct to OpenAI / Anthropic for cost and latency. Because the cohort is mostly AI Natives, a flat 30% Hyperscalers weight inflated the Hyperscalers channel (to $3.63B). The current model replaces the flat split with per-archetype weights so each entity routes through the channels its real customers actually use.
Every entity in the cohort now carries an archetype tag. The taxonomy:
| Archetype | What it covers | api_pct → Hyperscalers | enterprise_pct → Hyperscalers | API → AI Natives buyer |
|---|---|---|---|---|
| frontier_lab | Foundation-model labs (OpenAI, Anthropic, Google/Gemini, Mistral, xAI, Cohere, DeepSeek, etc.) | 5% | 50% | 70% |
| ai_native | AI-native scale-ups built on frontier APIs (Cursor, Glean, Perplexity, Harvey, ElevenLabs, Suno, Runway, Midjourney, etc.) | 10% | 20% | 30% |
| enterprise_saas | Traditional SaaS layering AI features (Salesforce, ServiceNow, Adobe, Notion, Microsoft Copilot, Databricks, etc.) | 50% | 60% | 0% |
| hyperscaler | Cloud providers (AWS, Azure, GCP) and neoclouds (CoreWeave, Lambda, Crusoe, Nebius) — Sankey treats these as compute providers, not buyers | — | — | — |
| iaas | Token-API aggregators / inference infra (Together, Fireworks, Groq, Replicate, Modal, OpenRouter) | 0% | 50% (default) | 50% |
| consumer_app | Pure consumer chat surfaces — rare standalone; usually rolled under frontier_lab via the subscription share | 0% | 50% (default) | 50% |
The routing reads each provider's archetype tag and applies the matching channel and buyer-segment weights. Untagged entities fall through to a conservative default (15% Hyperscalers / 85% Model API on the API split, 50/50 on the enterprise split) — under-counting Hyperscalers is editorially safer than over-counting.
Buyer relabel. The Who-Pays column previously showed Consumer / SME / Enterprise. SME conflated AI-native scale-ups (heavy API consumption, light headcount, no traditional SaaS revenue) with small businesses; the new buckets — Consumer / AI Natives / Enterprises & Govs / VC-Investors — surface what's actually on the demand side of the AI economy. Subscription revenue routes to Consumer; API revenue splits between AI Natives and Enterprises & Govs per archetype; enterprise contracts route to Enterprises & Govs.
Reconciliation against the Compute Ledger. Under the per-archetype model the Revenue Ledger Hyperscalers channel lands at ~$1.7B (gross). After the 20% hyperscaler take-rate, that implies ~$1.36B of cohort lab revenue passing through Hyperscaler resale — comparable to the Compute Ledger Hosted-model-APIs pass-through (~$1B) within Tier 2A noise. The remaining ecosystem Hosted-model-APIs gross (~$4.35B) covers non-cohort enterprise spend — Fortune 500 buyers using Bedrock or Azure OpenAI from outside the tracked-provider cohort — and is documented on the Compute Ledger but is out of scope for the Revenue Ledger cohort.
Any aggregate published on the site must be checked against the parallel figure on adjacent Ledgers — and any gap must be resolved by explicit bridge math, not silent re-tuning. Three reconciliations are surfaced below so readers can read each pair without having to internalise the bridge themselves.
The Compute Ledger trajectory chart shows realised quarterly compute revenue (e.g. MSFT $9.25B in Q1 2026 — $37B run-rate ÷ 4). The Compute Ledger now also surfaces a separate Forward commitments block that reports the multi-year contracted order book — Google's RPO ($242.8B → $467.8B Q4 25 to Q1 26, +$225B QoQ), Anthropic's $200B / 5-year deal with Google, and Anthropic+OpenAI's ~$718B aggregate committed to MAG3 hyperscalers. These two views describe overlapping economic reality on different time bases. Bridge: contracted dollars convert into realised quarterly compute revenue over the contract life of each booking (typically 1–8 years); they are not Q1 26 revenue and would double-count if mixed into the trajectory bars. The trajectory chart and the forward-commitments block are reconciled by being kept on separate surfaces with the relationship explicit.
The Capital Ledger shows realised hyperscaler capex (cumulative 2023–25 ~$745B; 2025 calendar ~$380B). The Compute Ledger forward-commitments block also tracks hyperscaler→lab equity investment — Google→Anthropic cumulative ~$43B, Amazon→Anthropic cumulative ~$33B (including contingent), AWS→OpenAI Feb 2026 $13B with up to $20B more. These dollars are not additive to capex: most of the equity round flows back to the same hyperscaler as a compute commitment (the Anthropic–Google $200B / 5-year deal is funded in part by Google's equity into Anthropic). Bridge: equity investment sits on a separate balance-sheet line from PP&E capex, and we do not add the two figures. The circular flow (hyperscaler equity → lab → hyperscaler compute commitment → realised capex over time) is reconciled by tagging each leg as a separate disclosure rather than presenting a single "AI investment" total.
The M365 Copilot 2025 figure was previously published at $5.4B ARR (15M paid seats × $30/mo × 12). Zitron's 2026-05-06 Azure billing leak revealed actual booked revenue an order of magnitude lower. The published M365 Copilot number has been restated as a three-band structure:
| Band | 2025 USD | Basis | Tier |
|---|---|---|---|
| List billings | $7.2B | 20M paid seats × $30/mo × 12 (post-Q1 FY26 seat refresh) | 1A |
| Bundling-adjusted incremental | $2.5–3.5B | Net of Copilot pricing absorbed into existing E5 / SKU bundles | 2A |
| Leaked actual (PRIMARY) | $1.2–1.5B (midpoint $1.35B) | Leaked Azure billing per Zitron 2026-05-06 (Q2 FY25 $367M, Q3 FY25 $300M) | 1B |
The published headline value is $1.35B (leaked-actual midpoint), with the list and bundling-adjusted bands carried as comparison context. Bridge to the Compute Ledger. The Compute Ledger's Copilot scope-out ($6.67B) is Microsoft's own declared exclusion — what Microsoft excludes from the Azure AI revenue line — and sits on a different basis (list-billings level, including M365 Copilot ~$5.4–7.2B, GitHub Copilot ~$1.65B, and Studio / Sales / Service residual). The Trad. SaaS view in the Apps stack uses the leaked actuals; the Compute Ledger uses Microsoft's own scope-out. The two are not the same number and we no longer present them as if they were.
Why this matters. Until the restatement, the M365 Copilot $5.4B figure flowed through to the Layer Stack and to the Apps Revenue → Compute multiplier. The leaked-actual restatement reduces the published Copilot line by ~75% and widens the visible spread between announced AI-product revenue and economically-realised AI-product revenue across the enterprise SaaS stack. That spread is now visible by design.
The Ledger surfaces two distinct revenue measures for AI providers, and they are not interchangeable. Mixing them is the most common mistake when reading frontier-AI financials.
| Measure | Definition | Where it surfaces | Time orientation |
|---|---|---|---|
| ARR (run-rate) | Trailing-quarter revenue × 4. Forward-looking proxy for the next 12 months at the current pace. | Usage Ledger hero stat ("Combined Provider ARR") and the per-provider chart bars and tiles. | Forward-looking |
| Collected revenue (audited) | GAAP-style revenue actually booked over a fiscal year. Backward-looking, audit-grade. | Home page hero ("$22B of AI revenue earned, CY23–25") and Revenue Ledger sankey; per-provider year-keyed entries on /timeline. | Backward-looking |
The two measures typically diverge by 5–6× for frontier model providers because ARR captures the latest quarter annualised (currently growing rapidly) while collected revenue captures the audited annual figure (which lags). For example, OpenAI's reported ARR was around $25B (Q1 2026 run-rate) while its 2025 collected revenue was around $4.3B — the same company, two different numbers, both correct for what they measure. The Ledger keeps both and labels each surface explicitly.
Year-keyed audited revenue — for example Anthropic's 2026 collected revenue at $6B — surfaces on the timeline page's FY revenue section, not the dashboard tile, by editorial choice. The dashboard tile is intentionally ARR-only because its purpose is "current state at run-rate"; year-keyed audited figures belong on the timeline, where the time orientation is explicit.
For the V2 launch, every datapoint carries a single confidence value — High, Med, or Low — and is rendered as one pill in the colour grammar shared with the Hepburn Advisory site.
| Confidence in data | Pill rendered as | Colour |
|---|---|---|
| High | T1 High | Green |
| Med | T2 Med | Amber |
| Low | T3 Low | Violet |
This is a simplified surface signal. The richer provenance taxonomy below (1A through 4) is the underlying classification framework — every datapoint is classified against that detailed scheme, then mapped down to the High / Med / Low pill shown on the page. A future version of the site will split the two axes (provenance separate from confidence).
Every data point in the AI Ledger carries a provenance tier. This is the underlying classification — it tells you where the number came from and how much weight to give it. The single-axis pill above is derived from this.
| Tier | Label | Definition |
|---|---|---|
| 1A | Authoritative | First-party disclosure — earnings report, regulatory filing, official press release |
| 1B | Corroborated | Two or more independent reporting sources agreeing within ~15% |
| 2A | Derived | Deterministic calculation from Tier 1 inputs (e.g. subtracting known segments from a reported total) |
| 2B | Triangulated | Value squeezed by multiple Tier 1 anchors setting upper and lower bounds |
| 3A | Anchored projection | Trend extrapolation from a Tier 1 or Tier 2 base point |
| 3B | Interpolated | Curve fit between two Tier 1 anchor points |
| 3C | Scenario assumption | Hand-picked rate or ratio — an explicit editorial choice, documented |
| 4 | Editorial | Round-number guess — replace as soon as better data arrives |
Primary sources used to construct the ledgers. Each source is classified by category, update frequency, and provenance tier.
| Source | Category | Frequency | Tier |
|---|---|---|---|
| NVIDIA 10-Q / earnings | CapEx anchor (DC revenue) | Quarterly | 1A |
| MSFT/GOOG/META/AMZN 10-K | CapEx cross-check | Annual | 1B |
| MSFT Copilot ARR disclosure | Revenue | Quarterly | 1A |
| OpenAI investor disclosures | Revenue | Irregular | 1B |
| Salesforce Agentforce ARR | Revenue | Quarterly | 1A |
| OpenRouter API | Token throughput | Daily | 2A |
| Epoch AI GPU tracking | China GPU estimates | Periodic | 3C |
| SEC filings (various) | Revenue, CapEx splits | Quarterly | 1A |
| The Information / WSJ | Revenue leaks, ARR estimates | Irregular | 1B |
| CoreWeave S-1 | Utilisation, pricing | One-time | 2A |
Transparency about our limits is part of the methodology. These are known gaps that we track but do not claim to have resolved: