White paper xxiii · AWS · Full text

The Bedrock commitment at the throughput window.

Twenty-two pages on the AWS Bedrock commercial scope. On-demand versus Provisioned Throughput, model-by-model token economics, custom-model unit hours, the agents and knowledge-bases layer, the EDP interaction, the IP indemnification posture and the renewal positioning the AWS practice runs across every Bedrock engagement.

AuthorMarcus T. Bennett
Pages22
PublishedNovember 2025
UpdatedApril 2026
Reading time30 minutes
Read in browser. Independent. Buyer-side. Not a partner, reseller, or affiliate of AWS or any other software vendor.

Inside the paper

  1. The Bedrock construct
  2. On-demand versus Provisioned Throughput
  3. Model-by-model pricing
  4. Custom-model unit hours
  5. Agents and Knowledge Bases
  6. EDP interaction
  7. IP indemnification posture
  8. Renewal posture
  9. Reading list and references
Section i

The Bedrock construct.

Amazon Bedrock is the AWS managed-foundation-model service. The construct is a single API surface across model families (Anthropic Claude, Mistral, Meta Llama, Amazon Titan, Stable Diffusion, Cohere and a small set of additional first and third-party models). The buyer accesses the model through a consistent invoke endpoint and AWS bills the underlying token, throughput or unit-hour consumption to the buyer’s AWS account.

Bedrock is therefore an AWS commercial surface first and a model-by-model offering second. The buyer who commits to Bedrock is committing to a consumption envelope on AWS, not to a single model publisher. The contractual relationship sits inside the AWS Customer Agreement and the AWS Service Terms; the model-by-model end-user licences sit on top, but the procurement counter-party is AWS.

The implication for procurement is direct. The Bedrock commitment is designed at the AWS commercial layer (the EDP, the Marketplace channel, the regional availability), not at the model-by-model layer. The buyer who designs the commitment model by model loses the AWS-side optionality the construct is built to give them.

Section ii

On-demand versus Provisioned Throughput.

Bedrock prices the same invoke call two ways. On-demand per-token billing meters every input and output token at a published per-million-token rate. Provisioned Throughput meters a reserved unit (model units, billed per hour) that guarantees a fixed token throughput regardless of the call volume that flows through it.

The two constructs are not interchangeable. On-demand is the construct for variable workloads where call volume is low or unpredictable, where time-to-first-token tolerance is loose, and where the per-token rate at the workload’s actual call volume is below the per-unit-hour cost of a reserved model unit. Provisioned Throughput is the construct for steady-state workloads where the call volume is predictable, where time-to-first-token tolerance is tight, and where the reserved-unit cost amortised across the workload’s call volume is below the on-demand rate.

The break-even arithmetic

The buyer-side break-even arithmetic is mechanical. For each named model, the workload’s expected token volume per hour is multiplied by the on-demand per-token rate to give the on-demand hourly cost. The Provisioned Throughput rate (per model unit per hour) is divided by the throughput-per-unit (published per model) to give the per-token cost at the reserved tier. Where the workload’s sustained throughput approaches the reserved unit’s capacity, Provisioned Throughput becomes the lower per-token construct.

On-demand is the variable workload. Provisioned Throughput is the steady state. The break-even is the procurement input.
Section iii

Model-by-model pricing.

Bedrock publishes per-million-token rates for input tokens and output tokens separately, by model and (where the model supports it) by context window. The buyer-side cost model must read input and output as distinct line items, not as a single blended token price.

The Claude family carries a tiered input/output schedule that varies across Haiku, Sonnet and Opus tiers, with output tokens priced higher than input tokens for each tier. The Mistral family carries a separate schedule across Large, Small and the open-weight Mistral and Mixtral series. The Meta Llama series carries a schedule that varies by parameter count and by the instruct or chat variant. The Amazon Titan series carries text, image and embedding rates priced under the AWS first-party schedule. The Stable Diffusion series prices per-image rather than per-token. Cohere prices the Command and Embed families on a per-token schedule similar to the third-party text models.

The model-selection arithmetic for the buyer is a function of (i) the per-token cost at the workload’s input/output mix, (ii) the model’s quality at the workload’s task, (iii) the context-window cost where a longer window is required, and (iv) the regional availability against the buyer’s data-residency requirement. The model with the lowest published per-token rate is rarely the correct model; the model with the best cost-per-correct-answer at the workload’s task is.

Section iv

Custom-model unit hours.

Bedrock supports two custom-model constructs. Custom Model Import allows the buyer to bring its own fine-tuned weights (for supported architectures) and host them on Bedrock. Continued Pre-Training and Fine-Tuning allow the buyer to start from a Bedrock-hosted base model and adapt it against the buyer’s training corpus inside AWS.

The custom-model commercial layer prices model-unit hours. A model unit is a reserved compute construct sized for a target throughput; the unit is billed per hour for the duration the custom model is deployed. The buyer therefore pays for the model’s availability (the unit-hour rate) and for the invocation volume (the per-token rate, where the custom model carries a published per-token tariff) separately.

The Admodum methodology for fine-tuned model deployment runs the unit-hour cost against the workload’s sustained call volume. A custom model deployed and lightly used carries a high effective per-token cost because the unit-hour cost is spread across a small token count. A custom model deployed at a saturated throughput against a sustained workload carries a low effective per-token cost. The decision to fine-tune is therefore a decision to commit to a workload volume that justifies the deployment.

Section v

Agents and Knowledge Bases.

Bedrock Agents is the orchestration layer that allows the buyer to define an agent workflow (named tools, an instruction prompt, a model selection) and invoke it through a single Bedrock endpoint. Bedrock Knowledge Bases is the managed retrieval-augmented-generation layer that ingests source documents, indexes them, and serves retrieval at invoke time.

The Agents construct prices the underlying invoke calls at the model’s per-token rate, with no separate agent fee. The orchestration cost is therefore the sum of the model invocations the agent makes inside one user request (the planning call, the tool-routing calls, the model-response call and any follow-up calls). A naively-built agent can multiply the underlying token cost by an order of magnitude; a tightly-built agent runs at close to the single-call cost.

The Knowledge Bases construct prices the embedding-model invocations at ingest time, the retrieval calls at invoke time (typically zero-cost on AWS-native vector stores; metered on Pinecone or OpenSearch Serverless against the underlying service), and the model invocation that consumes the retrieved context at the model’s standard rate. The per-query economics therefore depend on the retrieval substrate, the embedding-model choice and the retrieval window size. The Admodum protocol benchmarks the per-query cost across the substrate options before the Knowledge Base is committed.

Section vi

EDP interaction.

Bedrock spend draws against the AWS EDP commitment in the same way as any other AWS service spend, subject to the eligible-service-catalogue read at the EDP signature date. Bedrock has been included in the EDP eligible catalogue since the service’s general availability and continues to draw against the EDP commitment.

The inclusion read

The buyer with an active EDP must verify Bedrock’s inclusion in the eligible-service catalogue as defined in its specific EDP order. The inclusion is published, but the order-level confirmation is the contractual position; an inclusion at signature does not imply inclusion at every subsequent re-baseline. The buyer with a forthcoming EDP must structure the eligible-service catalogue to capture Bedrock and adjacent generative-AI services explicitly.

The joint-commitment design

Where the buyer is committing meaningful generative-AI spend, the EDP and the Bedrock consumption forecast are designed as a single commitment. The forecast covers Bedrock model invocations, custom-model unit hours, agent orchestration overhead and knowledge-base substrate cost across the EDP term. The forecast is then negotiated as a contributing component to the EDP commitment level, with a contingency for under-commit and an over-commit allowance for the upside case.

The risk the buyer is managing is the unspent generative-AI commitment. Bedrock workloads are notoriously hard to forecast in the first year of an EDP, because the rollout cadence depends on internal product and procurement reviews that frequently slip. The mitigation is to size the Bedrock contribution to the EDP at the conservative end of the forecast and to retain the upside as Marketplace pull-through or other AWS-native spend.

Section vii

IP indemnification posture.

AWS publishes an IP indemnification position for Bedrock that covers the buyer against third-party intellectual-property claims arising from the generated output of a covered model. The coverage is not uniform across all Bedrock models; it varies by model publisher, by model version and by the indemnification programme the model is enrolled in.

The buyer-side requirement protocol therefore tests three positions per named model. Is the model covered by the AWS IP indemnification on the date of the buyer’s expected first invocation? What is the contractual scope of the coverage (output content only, or output and input)? What is the cap, the carve-outs (prompt-injection, deliberate adversarial use) and the survival period?

The contractual position runs against the AWS Service Terms section that addresses generative-AI output. The buyer’s commercial counsel should examine the section against the buyer’s use cases and confirm that the carve-outs are tolerable. Where the carve-outs are not tolerable, the buyer either selects a model with broader coverage, takes a contractual amendment, or carries the residual risk as a documented commercial decision.

Section viii

Renewal posture.

Bedrock does not have a standalone renewal cycle in the way the AWS EDP does. The Bedrock commitment renews implicitly inside the EDP cycle; Provisioned Throughput reservations renew on their own commitment schedule (one-month, six-month and one-year reserved terms). The buyer therefore manages two renewal cadences: the EDP envelope renewal, and the Provisioned Throughput re-commitment inside the term.

The Provisioned Throughput re-commitment is a tactical decision. The reservation buys a guaranteed throughput at a discounted per-token rate; the buyer re-commits when the workload’s sustained throughput justifies the reservation. The discount is meaningful (the longer term carries a steeper discount), but the lock-in is meaningful too; a model deprecation inside the reservation term forces a migration that may not align to the buyer’s product cycle.

Model deprecation protection

AWS publishes a model lifecycle policy that names the support window for each Bedrock-hosted model. The buyer who is reserving Provisioned Throughput against a named model must verify the support window covers the reservation term; the buyer who is fine-tuning a base model must verify the base model remains supported through the deployment window the fine-tune is sized against.

Multi-vendor portability

The defensive posture against any single foundation-model lock-in is portability. The Admodum portability protocol abstracts the invocation surface behind an internal interface that can route a call to Bedrock, to the same model on the publisher’s native API, or to a comparable model on a second cloud (Vertex AI on Google Cloud or the Azure OpenAI service inside an Azure MACC). The abstraction preserves the buyer’s switching option without sacrificing the AWS-native commercial benefits.

A foundation-model commitment without a portability protocol is a commitment without a renewal posture.
Section ix

Reading list and references.

The AWS Bedrock paper sits inside an AWS reading list and an adjacent multi-vendor AI reading list. The companion papers extend the methodology to adjacent commercial mechanics:

The methodology in this paper is the methodology Admodum has applied across more than twenty AWS commercial engagements inside the firm’s engagement history. Each engagement is structured as fixed fee, contingency / gainshare or annual retainer, depending on the buyer’s posture at the commitment window.

Next in the series

Paper xxiv. AI vendors PTU design.

The Provisioned Throughput Units commercial construct across Azure OpenAI, AWS Bedrock and the publisher native APIs. Multi-cloud portability, model deprecation protection, IP indemnification across the three substrates.

Companion programme

Bring an advisor. Renewal Programme.

The methodology in this paper runs inside the Renewal Programme on a fixed-fee, contingency or annual-retainer basis. The EDP and Bedrock envelope is the procurement deliverable; the Programme is the operational envelope inside which it is built.

Independence
Admodum is not a partner, reseller, or affiliate of AWS, Anthropic, Meta, Mistral, Cohere, or of any other software vendor. No reseller margin, no referral commission, no deployment-partner subcontract.
Software licensing white paper

Run the methodology with a senior advisor.

A senior Admodum advisor will walk the Bedrock and EDP methodology through with your CIO, CFO or procurement team on a private call. Engagements run as fixed fee, contingency or annual retainer.