Bedrock commitment, throughput window · Read

Inside the paper

The Bedrock construct
On-demand versus Provisioned Throughput
Model-by-model pricing
Custom-model unit hours
Agents and Knowledge Bases
EDP interaction
IP indemnification posture
Renewal posture
Reading list and references

Section i

The Bedrock construct.

Amazon Bedrock is the AWS managed-foundation-model service. The construct is a single API surface across model families (Anthropic Claude, Mistral, Meta Llama, Amazon Titan, Stable Diffusion, Cohere and a small set of additional first and third-party models). The buyer accesses the model through a consistent invoke endpoint and AWS bills the underlying token, throughput or unit-hour consumption to the buyer’s AWS account.

Bedrock is therefore an AWS commercial surface first and a model-by-model offering second. The buyer who commits to Bedrock is committing to a consumption envelope on AWS, not to a single model publisher. The contractual relationship sits inside the AWS Customer Agreement and the AWS Service Terms; the model-by-model end-user licences sit on top, but the procurement counter-party is AWS.

The implication for procurement is direct. The Bedrock commitment is designed at the AWS commercial layer (the EDP, the Marketplace channel, the regional availability), not at the model-by-model layer. The buyer who designs the commitment model by model loses the AWS-side optionality the construct is built to give them.

Section ii

On-demand versus Provisioned Throughput.

Bedrock prices the same invoke call two ways. On-demand per-token billing meters every input and output token at a published per-million-token rate. Provisioned Throughput meters a reserved unit (model units, billed per hour) that guarantees a fixed token throughput regardless of the call volume that flows through it.

The two constructs are not interchangeable. On-demand is the construct for variable workloads where call volume is low or unpredictable, where time-to-first-token tolerance is loose, and where the per-token rate at the workload’s actual call volume is below the per-unit-hour cost of a reserved model unit. Provisioned Throughput is the construct for steady-state workloads where the call volume is predictable, where time-to-first-token tolerance is tight, and where the reserved-unit cost amortised across the workload’s call volume is below the on-demand rate.

The break-even arithmetic

The buyer-side break-even arithmetic is mechanical. For each named model, the workload’s expected token volume per hour is multiplied by the on-demand per-token rate to give the on-demand hourly cost. The Provisioned Throughput rate (per model unit per hour) is divided by the throughput-per-unit (published per model) to give the per-token cost at the reserved tier. Where the workload’s sustained throughput approaches the reserved unit’s capacity, Provisioned Throughput becomes the lower per-token construct.

On-demand is the variable workload. Provisioned Throughput is the steady state. The break-even is the procurement input.

Section iii

Model-by-model pricing.

Bedrock publishes per-million-token rates for input tokens and output tokens separately, by model and (where the model supports it) by context window. The buyer-side cost model must read input and output as distinct line items, not as a single blended token price.

The Claude family carries a tiered input/output schedule that varies across Haiku, Sonnet and Opus tiers, with output tokens priced higher than input tokens for each tier. The Mistral family carries a separate schedule across Large, Small and the open-weight Mistral and Mixtral series. The Meta Llama series carries a schedule that varies by parameter count and by the instruct or chat variant. The Amazon Titan series carries text, image and embedding rates priced under the AWS first-party schedule. The Stable Diffusion series prices per-image rather than per-token. Cohere prices the Command and Embed families on a per-token schedule similar to the third-party text models.

The model-selection arithmetic for the buyer is a function of (i) the per-token cost at the workload’s input/output mix, (ii) the model’s quality at the workload’s task, (iii) the context-window cost where a longer window is required, and (iv) the regional availability against the buyer’s data-residency requirement. The model with the lowest published per-token rate is rarely the correct model; the model with the best cost-per-correct-answer at the workload’s task is.

Section iv

Custom-model unit hours.

Bedrock supports two custom-model constructs. Custom Model Import allows the buyer to bring its own fine-tuned weights (for supported architectures) and host them on Bedrock. Continued Pre-Training and Fine-Tuning allow the buyer to start from a Bedrock-hosted base model and adapt it against the buyer’s training corpus inside AWS.

The custom-model commercial layer prices model-unit hours. A model unit is a reserved compute construct sized for a target throughput; the unit is billed per hour for the duration the custom model is deployed. The buyer therefore pays for the model’s availability (the unit-hour rate) and for the invocation volume (the per-token rate, where the custom model carries a published per-token tariff) separately.

The Admodum methodology for fine-tuned model deployment runs the unit-hour cost against the workload’s sustained call volume. A custom model deployed and lightly used carries a high effective per-token cost because the unit-hour cost is spread across a small token count. A custom model deployed at a saturated throughput against a sustained workload carries a low effective per-token cost. The decision to fine-tune is therefore a decision to commit to a workload volume that justifies the deployment.

Section v

Agents and Knowledge Bases.

Bedrock Agents is the orchestration layer that allows the buyer to define an agent workflow (named tools, an instruction prompt, a model selection) and invoke it through a single Bedrock endpoint. Bedrock Knowledge Bases is the managed retrieval-augmented-generation layer that ingests source documents, indexes them, and serves retrieval at invoke time.

The Agents construct prices the underlying invoke calls at the model’s per-token rate, with no separate agent fee. The orchestration cost is therefore the sum of the model invocations the agent makes inside one user request (the planning call, the tool-routing calls, the model-response call and any follow-up calls). A naively-built agent can multiply the underlying token cost by an order of magnitude; a tightly-built agent runs at close to the single-call cost.

The Knowledge Bases construct prices the embedding-model invocations at ingest time, the retrieval calls at invoke time (typically zero-cost on AWS-native vector stores; metered on Pinecone or OpenSearch Serverless against the underlying service), and the model invocation that consumes the retrieved context at the model’s standard rate. The per-query economics therefore depend on the retrieval substrate, the embedding-model choice and the retrieval window size. The Admodum protocol benchmarks the per-query cost across the substrate options before the Knowledge Base is committed.

Section vi

EDP interaction.

Bedrock spend draws against the AWS EDP commitment in the same way as any other AWS service spend, subject to the eligible-service-catalogue read at the EDP signature date. Bedrock has been included in the EDP eligible catalogue since the service’s general availability and continues to draw against the EDP commitment.

The inclusion read

The buyer with an active EDP must verify Bedrock’s inclusion in the eligible-service catalogue as defined in its specific EDP order. The inclusion is published, but the order-level confirmation is the contractual position; an inclusion at signature does not imply inclusion at every subsequent re-baseline. The buyer with a forthcoming EDP must structure the eligible-service catalogue to capture Bedrock and adjacent generative-AI services explicitly.

The joint-commitment design

Where the buyer is committing meaningful generative-AI spend, the EDP and the Bedrock consumption forecast are designed as a single commitment. The forecast covers Bedrock model invocations, custom-model unit hours, agent orchestration overhead and knowledge-base substrate cost across the EDP term. The forecast is then negotiated as a contributing component to the EDP commitment level, with a contingency for under-commit and an over-commit allowance for the upside case.

The risk the buyer is managing is the unspent generative-AI commitment. Bedrock workloads are notoriously hard to forecast in the first year of an EDP, because the rollout cadence depends on internal product and procurement reviews that frequently slip. The mitigation is to size the Bedrock contribution to the EDP at the conservative end of the forecast and to retain the upside as Marketplace pull-through or other AWS-native spend.

Section vii

IP indemnification posture.

AWS publishes an IP indemnification position for Bedrock that covers the buyer against third-party intellectual-property claims arising from the generated output of a covered model. The coverage is not uniform across all Bedrock models; it varies by model publisher, by model version and by the indemnification programme the model is enrolled in.

The buyer-side requirement protocol therefore tests three positions per named model. Is the model covered by the AWS IP indemnification on the date of the buyer’s expected first invocation? What is the contractual scope of the coverage (output content only, or output and input)? What is the cap, the carve-outs (prompt-injection, deliberate adversarial use) and the survival period?

The contractual position runs against the AWS Service Terms section that addresses generative-AI output. The buyer’s commercial counsel should examine the section against the buyer’s use cases and confirm that the carve-outs are tolerable. Where the carve-outs are not tolerable, the buyer either selects a model with broader coverage, takes a contractual amendment, or carries the residual risk as a documented commercial decision.

Section viii

Renewal posture.

Bedrock does not have a standalone renewal cycle in the way the AWS EDP does. The Bedrock commitment renews implicitly inside the EDP cycle; Provisioned Throughput reservations renew on their own commitment schedule (one-month, six-month and one-year reserved terms). The buyer therefore manages two renewal cadences: the EDP envelope renewal, and the Provisioned Throughput re-commitment inside the term.

The Provisioned Throughput re-commitment is a tactical decision. The reservation buys a guaranteed throughput at a discounted per-token rate; the buyer re-commits when the workload’s sustained throughput justifies the reservation. The discount is meaningful (the longer term carries a steeper discount), but the lock-in is meaningful too; a model deprecation inside the reservation term forces a migration that may not align to the buyer’s product cycle.

Model deprecation protection

AWS publishes a model lifecycle policy that names the support window for each Bedrock-hosted model. The buyer who is reserving Provisioned Throughput against a named model must verify the support window covers the reservation term; the buyer who is fine-tuning a base model must verify the base model remains supported through the deployment window the fine-tune is sized against.

Multi-vendor portability

The defensive posture against any single foundation-model lock-in is portability. The Admodum portability protocol abstracts the invocation surface behind an internal interface that can route a call to Bedrock, to the same model on the publisher’s native API, or to a comparable model on a second cloud (Vertex AI on Google Cloud or the Azure OpenAI service inside an Azure MACC). The abstraction preserves the buyer’s switching option without sacrificing the AWS-native commercial benefits.

A foundation-model commitment without a portability protocol is a commitment without a renewal posture.

Section ix

Reading list and references.

The AWS Bedrock paper sits inside an AWS reading list and an adjacent multi-vendor AI reading list. The companion papers extend the methodology to adjacent commercial mechanics:

AWS EDP Commitment Design, on the EDP construct, the ramp economics, the eligible-service catalogue and the Marketplace pull-through.
Microsoft Azure MACC Design and Defence, on the parallel Azure commercial envelope and the OpenAI service interaction.
Google Cloud EDP Design, on the GCP EDP and the Vertex AI commitment scope.
Salesforce Agentforce commitment design, an adjacent agentic-AI commercial scope on a different substrate.
Workday Illuminate AI, on the embedded-AI commercial scope inside an existing SaaS subscription.
The AWS practice page at /practices/aws/ sets out the six-point commercial cycle protocol the Admodum AWS practice runs across every Bedrock and EDP engagement.
The AI vendors practice page at /practices/ai-vendors/ sets out the cross-vendor AI commitment protocol.
The AWS knowledge hub at /knowledge/aws/ aggregates white papers, case studies, blog analysis and FAQ on AWS commercial mechanics.

The methodology in this paper is the methodology Admodum has applied across more than twenty AWS commercial engagements inside the firm’s engagement history. Each engagement is structured as fixed fee, contingency / gainshare or annual retainer, depending on the buyer’s posture at the commitment window.

The Bedrock commitment at the throughput window.