Three pricing models coexist in the Terraform and OpenTofu management platform market, and the choice between them shapes everything downstream — invoice predictability, engineering practices, incident behavior, vendor incentives, and the implementation architecture of the platform itself. This post is a buyer's evaluation framework: it defines each model, walks through the dimensions that matter, and shows how each model performs on each dimension. The asymmetries that emerge are useful inputs for platform selection.
The three pricing models
Each pricing model has two separable concerns: the pricing metric — the unit the bill scales on — and the tier — the feature bundle the customer selects, which determines what features are available and at what limits. This post is about pricing metrics: how each metric works, what it bills for, and what structural consequences each one has. Tier packaging (which features sit at which level, concurrency caps, support levels, contract minimums) is a related but separate buyer concern.
Concurrency-based pricing. Used by Spacelift. The pricing metric is parallel run slots — the bill scales with the number of concurrent workers the customer purchases. The same shape appears in CI products that bill on "concurrent workers" or "build agents" — CircleCI, Buildkite, and Jenkins-as-a-service offerings price this way.
Resources-under-management (RUM) pricing. Used by HCP Terraform across all non-grandfathered customers — every new account and every renewal moves onto this model. The pricing metric is resources × hours: every managed item in the Terraform state, including child resources, contributes to the hourly bill. An AWS security group with nine ingress/egress rules is the aws_security_group itself plus nine aws_security_group_rule resources — ten billable resources for what feels like one firewall configuration.
Usage-based, per-run pricing. Used by Scalr. The pricing metric is billable runs — a vendor-designed subset of all runs intended to align the bill with delivered infrastructure work; on Scalr, drift-detection runs are one example of a category carved out as non-billable. The same shape appears in usage-based SaaS more broadly — AWS Lambda (per invocation), Stripe (per transaction), Twilio (per message).
Dimensions to evaluate
1. Bill predictability
The classic procurement concern: can finance forecast the bill, and is the worst case bounded?
- Concurrency-based: Flat invoice once slot count is fixed for the contract. Predictable in both the normal case and the worst case. Strongest of the three on this axis.
- RUM: Variable with resource sprawl. Each new cloud resource added to managed state inflates the bill, including small resources customers may not realize count (security group rules, IAM policy attachments, route table entries). Hard to forecast as infrastructure grows.
- Usage-based, per-run: Variable in raw form. Predictability depends on whether the contract includes a negotiated cap on annual spend.
2. Alignment with work delivered
What is the customer actually paying for, relative to what they're getting?
- Concurrency-based: Paying for capacity (the right to do N things in parallel), not work performed. Idle slots cost the same as busy ones.
- RUM: Paying for storage of state — every managed resource continues to bill while it sits in state, regardless of whether it changes. A workspace can sit unmodified for a year and bill at the same rate as one with daily applies.
- Usage-based, per-run: Paying for work performed. The vendor designs which categories of runs count as billable to align the bill with delivered work.
3. Capacity planning burden
How much work does the pricing model push from the vendor to the buyer?
- Concurrency-based: Significant. The customer forecasts peak concurrency, monitors utilization, and engages procurement to purchase additional slots as needs increase. Forecasting is imperfect, which leaves the customer with either queueing (when slots are under-provisioned) or idle capacity (when over-provisioned).
- RUM: Moderate. The customer monitors resource counts as infrastructure grows. Fewer surprises than concurrency because resource growth is gradual, but multi-account / multi-region topologies inflate the count in non-obvious ways.
- Usage-based, per-run: Minimal. No slot count to forecast and no usage-driven breakpoint to anticipate. Budget forecasting is based on run volume, which scales with team activity in a directly modelable way.
4. Incident response behavior
How does each model behave during incidents or release windows, when many fixes typically need to ship across many workspaces in parallel?
- Concurrency-based: Worst-case bite at worst-case moment. During an incident, parallel fixes across multiple workspaces queue behind the slot cap. New workers can be provisioned quickly, but additional slots must be purchased — meaning the bottleneck is procurement, executed under pressure with the incident still active.
- RUM: Concurrency is typically capped as a packaging choice rather than driven by the pricing metric — adding more parallel runs isn't something the customer can buy directly, since the metric scales on resources rather than slots. To get more concurrency, the customer has to cross into the next tier, which typically carries a step-change in cost (often quadratic relative to the previous tier). The cap throttles incidents identically to concurrency-based pricing, just for a different structural reason.
- Usage-based, per-run: No concurrency cap from pricing; scales with demand. (Anti-abuse limits exist but are raised free on request and don't act as a hard ceiling on incident response.)
5. Alignment with customer value
How well does each pricing metric track the value the customer is actually getting from the platform?
- Concurrency-based: Vendor revenue scales with concurrency demand, but customer value from concurrency varies with workload — idle slots during off-peak hours still cost the customer while delivering no value, and busy slots during peak hours may not deliver proportionally more value either. The metric tracks reserved capacity, not delivered work, so vendor revenue diverges from customer value whenever utilization is uneven.
- RUM: Vendor revenue scales with the count of resources in state. State size correlates loosely with value — a workspace managing more infrastructure usually delivers more value — but the correlation is weak, and the metric creates pressure against good architectural practices (consolidation, moving ephemeral resources out of state, lean module design) that reduce state size without reducing delivered value. Vendor revenue grows when customer practice is anti-optimal.
- Usage-based, per-run: Vendor revenue scales with billable runs. Alignment with customer value depends on how "billable" is defined — a vendor that bills every run, including drift checks and pre-check failures, recovers a per-call dynamic where activity volume matters more than delivered work. A vendor that restricts billable runs to those that delivered real infrastructure changes routes revenue toward the delivered-work axis, which tracks customer value more closely than the other two metrics.
6. Implementation efficiency
How efficiently does each model use the underlying compute?
- Concurrency-based: Slot-based billing creates a structural incentive toward a 1:1 worker:run implementation — if multiple runs could share a worker, the customer could buy fewer slots and run more concurrency on the same hardware, undermining the metric. Spacelift's private workers, for example, each run one task at a time. Terraform and OpenTofu runs are dominantly I/O-bound — they spend most of their wall-clock time waiting on cloud-provider APIs, during which the worker's CPU and RAM sit idle. Workers are sized for the worst-case run and pay cold-start cost per slot. Customers running self-hosted workers pay the underlying compute bill while the hardware is mostly idle.
- RUM: Implementation neutral. No ratio constraint between runs and workers.
- Usage-based, per-run: Free to multiplex multiple runs onto a single agent. Scalr's self-hosted agents add 5 runs each — a ratio chosen for reliability (beyond it, file system / plugin cache / ephemeral port contention bites). The same hardware delivers 5x the concurrency of a 1:1 model, with cold-start amortized across runs.
Aggregating across dimensions:
- Concurrency-based pricing has one clear advantage (bill predictability for customers who can't access a contractual cap) and a set of structural disadvantages across the other dimensions, most notably incident-response throughput and the 1:1 worker:run efficiency cost.
- RUM pricing has no clear advantage and adds resource-counting pressure on top of the shared concurrency-cap limits.
- Usage-based, per-run pricing has structural advantages on most dimensions, with the caveat that bill predictability requires a negotiated cap.
The single dimension where concurrency-based pricing has a real advantage applies to a specific buyer segment (small / self-service / customers who don't negotiate). The other dimensions — capacity planning burden, incident response, alignment with customer value, implementation efficiency, and alignment with work delivered — favor usage-based pricing structurally.
For most buyer profiles, the implication is: pick a platform with usage-based, per-run pricing. Confirm the vendor will cap annual spend if predictability matters for procurement. Confirm how the vendor defines billable runs — that definition determines whether the pricing metric actually aligns the invoice with delivered work or just rebrands per-call pricing.
Where Scalr lands
Scalr's pricing metric is billable runs, not concurrent slots or managed resources. Drift-detection runs are carved out as non-billable — only drift-fixing runs are — which keeps the metric aligned with delivered infrastructure work rather than activity volume. For customers who want invoice predictability, Scalr will cap annual fees under straightforward conditions. Concurrency itself is a fraud-and-abuse control, not a commercial gate — the Scalr-managed runner pool starts at a per-account allowance and is raised free of charge on request, and self-hosted agents each add 5 runs on top.
That 5-runs-per-agent ratio isn't arbitrary; it's the engineering response to the I/O-bound waste problem on dimension 7. Because Scalr's billing isn't tied to a 1:1 worker:run ratio, the platform multiplexes multiple runs onto a single agent — sharing CPU during cloud-API waits, paying cold-start cost once per agent instead of once per slot, and absorbing burst into existing capacity without waiting for new workers to spin up. The 5 ceiling exists because beyond it, contention on local resources starts to bite reliability. The number isn't the point; the point is that Scalr's pricing model lets the engineering decision be driven by what's actually reliable, not by what's billable.
For teams evaluating platforms across the seven dimensions above, this matters: the choice of pricing model isn't just about the invoice. It shapes the engineering architecture, the incident behavior, and the practices the platform's economics will reward.