Azure OpenAI economics · Microsoft

Key takeaways

Azure OpenAI carries two commercial surfaces: pay-as-you-go (PAYG) token-rate pricing per million input and output tokens, and Provisioned Throughput Units (PTUs), a reserved-capacity model with hourly, monthly and yearly term commitments.
The PAYG token rate varies by model class (GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini, o1, embeddings, fine-tuned variants) and by region; the same model SKU can be priced differently in different Azure regions.
PTUs are the principal lever for predictable, high-throughput workloads. The PTU break-even against PAYG runs at the inflection where steady-state consumption exceeds the floor required to support the reserved-capacity term price; the inflection is workload-shape sensitive.
Azure OpenAI consumption burns the Microsoft Azure Consumption Commitment (MACC). A buyer with an under-burn MACC position can move Azure OpenAI workload to Azure to consume the commitment; the consumption rate against the burn target is the principal Year-N renewal lever.
The principal misuse pattern is the unmeasured fine-tune cost: a fine-tuned model variant carries a model-hosting hourly charge for the lifetime of the deployment, in addition to the inference token charge; the hosting charge frequently exceeds the inference charge on low-volume deployments.

Section i

The two commercial surfaces.

Azure OpenAI carries two commercial surfaces. Pay-as-you-go (PAYG) is the token-rate model: the buyer pays per million input tokens and per million output tokens, with rates that vary by model class, region and deployment SKU (Standard, Provisioned, Batch). PTU (Provisioned Throughput Units) is the reserved-capacity model: the buyer reserves a measured throughput in PTUs, on a one-month or one-year term, and pays the hourly PTU rate for the term.

The PAYG surface is structurally appropriate for variable, low-floor workloads (development, prototyping, low-volume production). The PTU surface is structurally appropriate for predictable, high-throughput workloads where the steady-state consumption justifies the reserved-capacity floor. The wider Azure reserved instances framework reads against the same reserved-versus-PAYG inflection on the infrastructure surface; the wider Azure MACC design is the commitment vehicle that both surfaces burn.

Section ii

The model SKU surface.

The Azure OpenAI model SKU surface spans the GPT-4 family (GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini), the o-series reasoning models (o1, o1-mini), the embedding models (ada, text-embedding-3-small, text-embedding-3-large), the fine-tuning variants of the GPT-3.5 and GPT-4 families, and the image and speech models (DALL-E 3, Whisper, GPT-4o-realtime).

The token-rate ratio between GPT-4o-mini and GPT-4 (the historic full-capability model) is typically above 100:1 on input tokens; the Admodum read on the model-selection decision is the workload-fit calculus, not the marquee-model assumption. A workload that runs effectively on GPT-4o-mini for the principal flow, with selective GPT-4o or o1 escalation for the complex flow, captures the order-of-magnitude unit-cost saving and runs on shared infrastructure.

Section iii

The PTU break-even.

The PTU break-even against PAYG depends on the steady-state consumption shape. The PTU price is paid by the hour for the term; the PAYG price is paid per token consumed. The break-even runs where the steady-state PAYG run-rate equals the PTU hourly cost; below the break-even, PAYG is cheaper; above the break-even, PTU is cheaper.

The break-even is workload-shape sensitive. A workload with a flat 24-hour consumption curve hits the break-even at a lower steady-state level than a workload with a daytime-peak, nighttime-zero curve; the PTU floor is paid for the full term, regardless of the consumption pattern. The Admodum read on the PTU sizing decision is the consumption-shape modelling, not the peak-throughput sizing. The wider Azure Marketplace burn framework reads the third-party AI consumption that runs against the same MACC.

Section iv

The fine-tuning hosting trap.

Fine-tuning Azure OpenAI models is structurally attractive: a fine-tuned model variant captures domain-specific patterns at a fraction of the per-token cost of a prompt-engineered base model. The unmeasured cost is the model-hosting hourly charge: a fine-tuned model deployment carries an hosting charge per hour for the lifetime of the deployment, in addition to the inference token charge.

The Admodum read on the fine-tuning decision is the steady-state inference-volume calculus. A fine-tuned deployment at 100,000 inferences per month carries a hosting charge that frequently exceeds the inference-token saving versus base-model with prompt engineering; the break-even sits in the hundreds of thousands to low millions of inferences per month, depending on the model class and the regional pricing. The wider Copilot seat economics spoke reads the parallel calculus on the per-seat surface.

Section v

The MACC burn against Azure OpenAI.

Azure OpenAI consumption burns the Microsoft Azure Consumption Commitment (MACC) at the published rate, no deduction, no exception. A buyer with an under-burn MACC position carries an option: move marginal AI workload to Azure OpenAI to consume the commitment before the term-end forfeiture clause triggers.

The Year-N renewal lever runs against the burn rate. A buyer who has consumed at the contractual rate negotiates the next MACC term from a position of demonstrated demand; a buyer who has under-consumed has the under-burn liability as a renewal-conversation drag. The wider Microsoft BATNA framework reads against the AWS Bedrock and Google Cloud Vertex AI alternatives that re-set the per-token rate at a market level on renewal.

Section vi

The buyer artefacts.

The buyer-side artefacts to hold against the Azure OpenAI estate are: the model-usage inventory (every model SKU, every workload, every monthly token volume), the deployment-SKU declaration (Standard versus Provisioned versus Batch, per model), the PTU position (every PTU reservation, every term, every utilisation), the fine-tune register (every fine-tuned variant, every hosting hour, every inference volume), the MACC burn position (consumption against commitment, by month, by SKU class).

Azure OpenAI economics is model SKU, deployment SKU, PTU position and MACC burn. The buyer-side artefacts close all four; the Year-N renewal reads against them.

The wider engagement sits in the Microsoft practice and the AI Vendors practice; active renewal moments route to the Renewal Programme; active audit moments route to Audit Defence.

Azure OpenAI economics.

Key takeaways

The two commercial surfaces.

The model SKU surface.

The PTU break-even.

The fine-tuning hosting trap.

The MACC burn against Azure OpenAI.

The buyer artefacts.

Continue the reading.

Azure Hybrid Benefit

Azure Marketplace burn

Azure MACC design

Read your Azure OpenAI burn with a senior advisor.