The PTU at the steady-state window · Read

Inside the paper

The PTU construct across five substrates
Sizing arithmetic
PTU versus PAYG break-even
Deployment-zone affinity
Reservation tenor and re-commitment
Model deprecation protection
IP indemnification position
Multi-cloud portability architecture
Renewal posture
Reading list and references

Section i

The PTU construct across five substrates.

The Provisioned Throughput Unit is the converged commercial form for steady-state generative-AI capacity. Five substrates carry the construct in different names. Azure OpenAI sells PTUs (and Provisioned-Managed deployments) inside the Azure subscription. AWS Bedrock sells Provisioned Throughput in model units. Anthropic offers enterprise reserved throughput on the Claude API. OpenAI offers enterprise reservations on the GPT API. Google Vertex AI sells dedicated throughput as a Vertex reservation.

The five constructs share the same procurement shape: a reserved, sustained throughput at a fixed hourly cost, sold against a defined tenor with a discount that grows with commitment length. The mechanics behind that shape differ. Azure PTUs bind to a region and a model version; Bedrock model units bind to a model and a commitment tenor; Anthropic and OpenAI reservations bind to the publisher’s direct API and may carry account-specific terms; Vertex reservations bind to a region and a quota project.

The buyer-side reading therefore treats the PTU as a family of constructs, not as a single construct. The decision tree opens with a substrate selection (informed by the existing commercial envelope, the data-residency posture and the model family the workload requires) and proceeds to a substrate-specific sizing exercise.

Section ii

Sizing arithmetic.

Each substrate publishes a target throughput per reserved unit, conditional on a stated latency profile. Azure publishes tokens-per-minute per PTU across the GPT family. Bedrock publishes input and output token throughput per model unit across the supported models. Vertex publishes throughput per dedicated unit across the supported Vertex foundation models. Anthropic and OpenAI publish enterprise reserved tier capacities on a customer-by-customer basis.

The trailing-period read

The Admodum sizing protocol begins with the deployed workload’s trailing-period sustained tokens-per-minute, drawn from application telemetry not from the publisher’s forecast. The sustained read is the median throughput across the trailing four to eight weeks, with the burst envelope separately characterised by the peak-to-sustained ratio at the relevant time-of-day percentile.

The reservation count

The reservation count is sized so the sustained throughput sits inside the reservation’s steady-state capacity at the workload’s latency tolerance, with the burst envelope handled by either a spillover routing rule to the pay-as-you-go endpoint or by an over-reservation that absorbs the peak. The latency tolerance matters because the published throughput is conditional on a stated time-to-first-token; a tighter latency requirement reduces the effective throughput per reserved unit, which raises the reservation count.

A PTU sized against an aspirational rollout is a reserved unit running at shelfware utilisation.

Section iii

PTU versus PAYG break-even.

Every substrate sells a pay-as-you-go endpoint alongside the reserved construct. The pay-as-you-go endpoint prices each call against the published per-token rate. The reserved endpoint prices a guaranteed throughput against an hourly rate. The break-even between the two is the load-bearing arithmetic.

The break-even calculation, in shape, is identical across the substrates. The reserved hourly rate is divided by the throughput-per-unit (giving a per-token cost at the reserved tier). The reserved per-token cost is compared to the pay-as-you-go per-token cost. The breakeven utilisation is the point at which the reserved per-token cost equals the pay-as-you-go per-token cost. Below the breakeven utilisation, pay-as-you-go is cheaper; above it, the reservation is cheaper.

Hidden costs in burst routing

The arithmetic carries two hidden costs that the headline calculation often misses. First, the spillover-to-PAYG routing for the burst envelope is a real cost; if the burst is sized too conservatively, more traffic spills over than expected and the all-in cost runs above the reservation budget. Second, the deployment-zone affinity (Section IV) constrains where the reservation can be drawn from; cross-region pay-as-you-go traffic may price differently from the in-region reserved traffic.

The Admodum methodology models both costs explicitly. The reservation count is sized so the sustained throughput is inside the reservation, the documented burst envelope is inside the reservation’s headroom, and only the residual outside-percentile burst spills to PAYG.

Section iv

Deployment-zone affinity.

Each PTU substrate binds the reservation to a deployment zone. Azure OpenAI binds PTUs to a region and (for some models) to a specific zone within the region. AWS Bedrock binds Provisioned Throughput to a region. Vertex AI binds dedicated throughput to a region and a quota project. The publisher direct APIs (Anthropic, OpenAI) typically reserve capacity globally inside the publisher’s infrastructure.

The buyer-side implications run on three axes. The data-residency requirement may dictate which regions are eligible (a UK-residency requirement narrows Azure and Vertex to UK-resident regions; the publisher direct APIs may not satisfy the residency requirement at all). The latency profile favours the region closest to the consuming application (which may not be the cheapest region). The failover posture requires either a second reservation in a paired region or a documented spillover plan.

The procurement deliverable is therefore a regional-residency map for each workload, with the primary region carrying the reservation, the secondary region carrying either a smaller reservation or a documented PAYG fallback, and the publisher direct API treated as a tertiary option where the residency and indemnification posture permit.

Section v

Reservation tenor and re-commitment.

The reserved tenor varies by substrate. Azure PTUs are sold in monthly, annual and three-year commitments, with the discount steepening with tenor. AWS Bedrock Provisioned Throughput is sold in one-month and six-month commitments at the standard tier and in annual commitments at the discounted tier. Anthropic and OpenAI enterprise reservations are negotiated on a customer-by-customer basis, typically with annual or multi-year tenors. Vertex dedicated throughput is sold on annual commitments.

The discount curve

The discount curve is non-linear across the substrates. Azure offers the steepest discount for three-year PTUs (typically in the 40 to 60 percent range against PAYG-equivalent tokens). Bedrock’s six-month commitment carries a meaningful but more modest discount, with the one-month tier sitting close to PAYG. Vertex’s annual dedicated-throughput discount sits in a similar range. The publisher direct APIs price the enterprise reservation against negotiated tiers that vary by commit size.

The cancellation posture

The cancellation posture is rarely soft. A reservation is, in most cases, a take-or-pay commitment for the tenor. The buyer-side discipline is to size the reservation against the floor of the expected workload (so the reservation is consumed even on the conservative case) and to layer additional reservations on top as the rollout cadence confirms the upside.

Section vi

Model deprecation protection.

Each substrate publishes a model lifecycle policy that names the support window for each hosted model. Azure publishes GPT model deprecation dates with a migration window. Bedrock publishes a model lifecycle policy with a sunset notice for retired models. Anthropic and OpenAI publish model availability windows for their hosted models. Vertex publishes Vertex model lifecycle entries with migration guidance.

The buyer-side risk is a reservation against a model that is deprecated inside the reservation tenor. The reservation is contractual; the model is not. A deprecation forces a migration to a successor model that may carry different throughput characteristics, different per-token rates and different IP indemnification posture. The migration is a project, not a configuration change.

The Admodum protocol cross-references the reservation tenor against the published lifecycle window for the named model. Where the lifecycle window does not cover the reservation tenor, the procurement requires either a contractual commitment to migration parity (the reservation is honoured against the successor model at no additional cost) or a shorter reservation tenor that aligns to the lifecycle window.

Section vii

IP indemnification position.

The IP indemnification position varies materially across the substrates. Microsoft publishes the Customer Copyright Commitment for Azure OpenAI, with named coverage for output content under defined conditions. AWS publishes the Bedrock IP indemnification with model-by-model coverage variation. Anthropic publishes commercial terms with indemnification language for the Claude API. OpenAI publishes the Copyright Shield for ChatGPT Enterprise and the API. Google publishes the Vertex generative-AI indemnification.

The buyer-side requirement protocol

The Admodum protocol tests four positions per substrate. The coverage scope (output only, or input and output). The carve-outs (prompt-injection, adversarial use, non-default safety settings). The cap (per-claim, aggregate, or none). The survival period (during the reservation tenor, after termination, or limited to active subscriptions).

Where the carve-outs are not tolerable for the buyer’s use case, the procurement either selects a substrate with broader coverage, takes a contractual amendment to expand the coverage, or carries the residual risk as a documented commercial decision approved at the right level inside the buyer organisation.

An IP indemnification is only as broad as the carve-out list.

Section viii

Multi-cloud portability architecture.

The defensive posture against any single-substrate lock-in is portability. The portability architecture sits inside the buyer’s application layer and abstracts the model invocation behind an internal interface that can route to any of the five substrates.

The abstraction interface

The interface accepts a normalised invocation (a message list, a system prompt, a model identifier, the inference parameters) and routes it to a substrate-specific adapter. The adapter translates the normalised invocation into the substrate’s native API call, executes the call, and returns the normalised response. The application above the interface remains substrate-agnostic.

The routing protocol

The routing protocol selects the substrate at invoke time based on the workload’s requirements (data residency, latency tolerance, IP indemnification posture) and the substrate’s current state (reservation utilisation, model availability, regional capacity). A primary substrate carries the steady-state traffic; secondary substrates absorb burst, regional fallback or substrate-specific quality cases.

The architecture preserves the buyer’s switching option without sacrificing the substrate-native commercial benefits. Where a reservation expires or a substrate changes its commercial terms unfavourably, the buyer re-routes the workload without re-engineering the application.

Section ix

Renewal posture.

PTU renewals run on two cadences. Inside a hyperscaler envelope (Azure MACC, AWS EDP, GCP EDP), the PTU renewal interacts with the parent envelope renewal. On a publisher direct API (Anthropic, OpenAI), the enterprise reservation renews on its own contractual schedule.

The envelope-bound PTU is sized as a contributing component to the parent envelope commitment, with the renewal posture documented in the same renewal position paper that covers the wider envelope. The publisher-direct PTU is documented separately, with its own renewal calendar and its own BATNA position.

The BATNA for any PTU is portability. The buyer who has built the abstraction interface and routed the workload across two substrates carries a credible substrate switch as the BATNA; the buyer who has built the workload natively against a single substrate API does not. The Admodum methodology therefore treats the portability architecture as a renewal preparation activity, not as a technical-debt activity.

Section x

Reading list and references.

The AI Vendors PTU paper sits inside the AI Vendors practice reading list. The companion papers extend the methodology to each substrate’s standalone commercial envelope:

Microsoft Azure MACC Design and Defence, on the Azure consumption envelope inside which the Azure OpenAI PTU sits.
AWS Bedrock commitment design, on the Bedrock Provisioned Throughput construct, the model-by-model token economics and the EDP interaction.
Google Cloud EDP design, on the GCP commercial envelope inside which Vertex AI dedicated throughput sits.
AWS EDP commitment ramp design, on the EDP envelope inside which the Bedrock spend is anchored.
Salesforce Agentforce commitment design, an adjacent agentic-AI commercial scope on a different substrate.
Workday Illuminate AI, on the embedded-AI commercial scope inside an existing SaaS subscription.
The AI Vendors practice page at /practices/ai-vendors/ sets out the cross-vendor AI commitment protocol the Admodum AI Vendors practice runs across every engagement.
The AWS, Microsoft and Google Cloud knowledge hubs at /knowledge/aws/, /knowledge/microsoft/ and /knowledge/google-cloud/ aggregate white papers, case studies, blog analysis and FAQ on each substrate’s commercial mechanics.

The methodology in this paper is the methodology Admodum has applied across cross-vendor AI commitments inside the firm’s engagement history. Each engagement is structured as fixed fee, contingency / gainshare or annual retainer, depending on the buyer’s posture at the commitment window.

The PTU at the steady-state window.