Dream Engines

Action conditioning

The conditioning tensor is what makes a Dream Engine rollout yours. It's the planned trajectory the world model conditions its generation on — change it, change the rollout.

For action-conditioned specs (DreamDojo and similar), this tensor encodes a robot action sequence. Future specs may use different modalities (camera trajectories, text prompts, language tokens); the wire shape (T, action_dim) float32 stays the same — only the semantics of action_dim change.

Shape and dtype

Always (T, action_dim) float32. Per-spec values:

PYTHON
model = client.models.get("…")
print(model.action_dim) # spec's expected dim
print(model.chunk_size) # T must be a multiple of this

For DreamDojo · GR-1 today: action_dim=384, chunk_size=12, canonical T=48. The SDK validates these at the boundary — mismatches raise dream.InputValidationError before the request hits the network.

DreamDojo · GR-1: what's in those 384 dimensions

The action_dim is set by the embodiment, not by the engine. For GR-1-class specs (per NVIDIA's GR00T pipeline), the 384 floats decompose as:

SlotDimDescription
[0:7]7left-arm joint angles (rad)
[7:14]7right-arm joint angles (rad)
[14:20]6torso + neck joints
[20:101]81left-hand dexterous joints (15 fingers × normalized + grip)
[101:182]81right-hand dexterous
[182:283]101left-hand absolute positions (Fourier 5-finger)
[283:384]101right-hand absolute positions

Different embodiments (G1, AGIBOT, YAM) use the same 384-dim layout but fill different slices — slots outside the embodiment's joint count are zero-padded.

You don't need to memorize the slicing. Either:

  1. Get actions from a real teleop dataset — use NVIDIA's nvidia/PhysicalAI-Robotics-GR00T-Teleop-GR1 HuggingFace dataset, which already ships actions in the right format.
  2. Query a robot policy — most policies trained on GR00T data output the canonical 384-dim vector directly.
  3. Generate candidate actions for MPC — sample around a real trajectory; the in-distribution prior keeps the rollout coherent.

Magnitude

Real teleop actions land in roughly [-1.5, +1.5]. The synthetic example in dream.examples.dreamdojo_grasp() uses a 0.05-amplitude sinusoid — well within distribution but visually boring (the rollout shows a small, smooth wobble).

If your actions land far outside the trained distribution, the rollout quality degrades quickly — the model extrapolates, but the further out you go the more unphysical the result.

Action-dim mismatch

The most common debugging trap: action_dim isn't the same across embodiments.

PYTHON
# Wrong embodiment
actions = np.zeros((48, 64), dtype=np.float32) # G1 has 64 dims, not 384
model = client.models.get("dreamdojo-2b-gr1")
model.predict(start_frame=img, actions=actions)
# → dream.InputValidationError:
# "actions action_dim mismatch: model expects 384, got array with shape (48, 64)"

The SDK catches this client-side. Always check model.action_dim before constructing the array.

Determinism

The rollout is conditioned on (seed, start_frame, actions). Same three → bit-identical mp4 on the same hardware. Two different action sequences with the same start_frame + seed produce two different rollouts; that's exactly the property predict_batch exploits for visual MPC.