Dream Engine — pricing
$0.0005 per generated frame. Flat rate, every customer. ~$0.0245 per standard 49-frame rollout. Paid out of a prepaid credit balance topped up via Stripe Checkout — no tiers, no commitments, no warm-pool minimum.
This page is the single source of truth — when prices change, this file updates first, the website second, customer emails third. Internal margin policy is at the bottom.
How it works
- The customer mints (or receives) an API key. Each key has a prepaid credit balance, denominated in USD cents.
- The customer tops up the balance with a one-time Stripe Checkout payment —
client.billing.topup(amount_usd=25.00)returns a hosted checkout URL. - Each
predictcall pre-debits the predicted cost (frames × $0.0005, rounded up to the nearest cent) from the balance before the engine runs. - Engine errors trigger an automatic refund. Insufficient balance returns HTTP 402 before any GPU work fires.
import dream
client = dream.Client() # reads DREAM_API_KEY
print(client.billing.balance().balance_usd) # e.g. 47.75
# Run a rollout — debits 2 cents on the way through.
r = client.models.get("dreamdojo-2b-gr1").predict(
start_frame=img, actions=acts,
)
# Top up when low.
session = client.billing.topup(amount_usd=25.00)
print("Open this in a browser to pay:", session.url)
What you get
- Flat per-frame pricing — $0.0005, regardless of resolution, batch size, or volume.
- Three-metric quality gate enforced (PSNR / SSIM / LPIPS — see
docs/RESULTS.md). - Per-frame transparency on every response (
X-DreamEngine-Frames,X-DreamEngine-Estimated-Charge-USDheaders, surfaced asrollout.cost_usd). - Status page (
/v1/status) with rolling 24h P50 / P99 latency. - Automatic refunds on engine error — you only pay for frames the engine successfully delivered.
- All currently shipped optimisations enabled by default (Fused QKV, LUT conditioning, TeaCache, T5 cache, guidance=0 short-circuit).
- A typed Python SDK that surfaces 402 as
dream.InsufficientCreditsErrorcarrying both the current balance and the requested amount.
Sub-cent precision
The credits ledger stores balances in mils (1 mil = $0.0001 = 1/100 of a cent), so per-frame charges are exact, not rounded. At $0.0005/frame:
- 1 frame = exactly 5 mils ($0.0005)
- 49 frames (canonical DreamDojo rollout) = exactly 245 mils ($0.0245)
- $5.00 = exactly 50,000 mils
This matters at scale: pre-0.2.1 the engine rounded the 49-frame rollout to 2¢ ($0.02), under-charging by 18% per call. Post-0.2.1 every rollout is billed exactly. Across 1M rollouts/year that's ~$4,500 of revenue we used to lose to rounding.
Customers see balance_mils (exact) plus balance_cents and balance_usd (derived display values) on every ledger response.
Top-up bounds
| Limit | USD |
|---|---|
| Minimum top-up | $5 |
| Maximum top-up | $10,000 |
The SDK validates these client-side before any HTTP call. The engine enforces the same bounds server-side. Need a larger top-up? Email hello@dreamengines.run.
Rate limits
A per-key token bucket catches abuse on top of the credits ledger. Default knobs on a fresh key:
| Setting | Default |
|---|---|
| qps refill | 2.0 |
| burst | 10 |
When the bucket empties the engine returns 429 with Retry-After; the SDK retries automatically. Need higher limits for a planning loop? Email and we'll dial up your key.
What you don't get yet (post-v1)
- Streaming output (
/v1/predict/streamSSE) — gated on the v0.6 streaming runner. - Continuous-batching scheduler (vLLM-style) — concurrent requests serialise on the GPU today.
- Multi-region failover — single Modal region for v1.
- Self-serve signup — onboarding is hand-shake via
scripts/create_api_key.pyfor the first cohort. - WebRTC realtime / fal-WMA bridge.
Reference: how a customer's bill computes
debit_cents_for_predict
= ceil(num_frames × $0.0005 × 100 cents)
= ceil(num_frames × 0.05)
A planner doing visual MPC at K=8 fused candidates, 10 decisions/sec for 1 hour:
1 hour × 10 decisions/sec × 8 candidates × 49 frames = 14,112,000 frames
14,112,000 × $0.0005 = $7,056
By comparison, self-hosting that same workload directly on Modal H100s requires 14,112,000 / 49 = 287,999 rollouts × ~3 s engine_wall = ~240 hours of GPU time = ~$948 of raw Modal compute — but spread across weeks of engineering effort to set up, optimise, monitor, and harden.
Markup: ~7.4× over self-hosted compute. That's what the customer pays for the bundle of optimised inference + zero ops + per-frame metering + customer support. If a customer needs significant volume on committed terms, email and we'll figure it out per-customer rather than via a self-serve discount tier.
Internal margin policy (not customer-facing)
Cost basis (Modal H100 SXM @ $3.95/hr)
| Component | Time | Cost |
|---|---|---|
| Engine wall (warm) | 2.62 s | $0.0029 |
| HTTP transit + mp4 encode | ~1.4 s | $0.0015 |
| Marginal cost / rollout (warm) | ~4 s | $0.0044 |
| Marginal cost / frame | — | $0.00009 |
| Cold-start amortisation (~70 s / 100 rollouts/session) | ~0.7 s/rollout | +$0.00077 |
| Allocated cost / frame | — | ~$0.0001 |
Margin
| Item | Value |
|---|---|
| Selling price / frame | $0.0005 |
| Allocated cost / frame | ~$0.0001 |
| Gross margin | 80% |
The flat-pricing model trades the slim per-frame discount we used to give Scale-tier customers ($0.0004) for simplicity: one number, one billing flow, no tier-eligibility footnotes. At expected v1 volume (<100M frames/mo across the cohort) the lost margin is in the noise; at high volume we negotiate per-customer (see "Re-price triggers" below).
Margin floor
If allocated cost ever exceeds 30% of selling price (margin drops below 70%), we either:
- Re-optimise the engine (next levers: VAE compile, Conv FP8, self-forcing checkpoint training).
- Renegotiate Modal pricing or move workloads to cheaper providers (nebius, runpod) when commercially viable.
- Raise prices, with 30 days' notice to customers.
Current state: ~18% cost ratio, 82% gross margin. Healthy.
Cold-start economics
We ship with min_containers=0 and a tightened scaledown_window=60s (was 600s). First request after idle pays the ~70 s cosmos load tax. Switching to an always-warm container costs $3.95/hr × 24 × 30 ≈ $2,840 / month / container of pure GPU rent. We'll flip when:
- A paying customer complains about cold-start latency in writing, OR
- The cohort's blended traffic crosses the break-even point ($2,840 / $0.0005 = 5.68M frames/month attributable to the gap), OR
- One large customer commits to >2M frames/month and asks for warm-pool as part of the deal.
The 60s scaledown is a conscious bet that bursty traffic is the common case for early customers (a single eval batch, then quiet); idle dollars dominated at 600s.
Re-price triggers
- A new Modal GPU class (B200, H200, …) shifts the cost basis — re-derive marginal cost, decide whether to pass savings on or pocket margin.
- A new optimisation lands and drops cost by ≥ 20% — 50/50 split between price reduction (customer-facing) and margin (re-investment in engineering).
- Competitive pressure from fal / Together / Anyscale serving cosmos directly.
- A customer asks for a committed-use discount (>1M frames/month for 12 months) — handle per-customer; flat list price stays.
History
| Date | Change | Reason |
|---|---|---|
| 2026-05-05 | Tiers dropped — flat $0.0005/frame, prepaid credits. | Cleaner billing story for early users. Frees us to advertise "$0.0245/rollout, no commitments" without footnotes about tier eligibility. Scale-tier discount ($0.0004) absorbed back into list price. |
| 2026-05-04 | v1 pricing published — Pro $0.0005/frame, free 1K/mo | Track C launch (now superseded). |