Action conditioning
The conditioning tensor is what makes a Dream Engine rollout yours. It's the planned trajectory the world model conditions its generation on — change it, change the rollout.
For action-conditioned specs (DreamDojo and similar), this tensor
encodes a robot action sequence. Future specs may use different
modalities (camera trajectories, text prompts, language tokens); the
wire shape (T, action_dim) float32 stays the same — only the
semantics of action_dim change.
Shape and dtype
Always (T, action_dim) float32. Per-spec values:
model = client.models.get("…")print(model.action_dim) # spec's expected dimprint(model.chunk_size) # T must be a multiple of thisFor DreamDojo · GR-1 today: action_dim=384, chunk_size=12,
canonical T=48. The SDK validates these at the boundary —
mismatches raise dream.InputValidationError before the request
hits the network.
DreamDojo · GR-1: what's in those 384 dimensions
The action_dim is set by the embodiment, not by the engine. For GR-1-class specs (per NVIDIA's GR00T pipeline), the 384 floats decompose as:
| Slot | Dim | Description |
|---|---|---|
[0:7] | 7 | left-arm joint angles (rad) |
[7:14] | 7 | right-arm joint angles (rad) |
[14:20] | 6 | torso + neck joints |
[20:101] | 81 | left-hand dexterous joints (15 fingers × normalized + grip) |
[101:182] | 81 | right-hand dexterous |
[182:283] | 101 | left-hand absolute positions (Fourier 5-finger) |
[283:384] | 101 | right-hand absolute positions |
Different embodiments (G1, AGIBOT, YAM) use the same 384-dim layout but fill different slices — slots outside the embodiment's joint count are zero-padded.
You don't need to memorize the slicing. Either:
- Get actions from a real teleop dataset — use NVIDIA's
nvidia/PhysicalAI-Robotics-GR00T-Teleop-GR1HuggingFace dataset, which already ships actions in the right format. - Query a robot policy — most policies trained on GR00T data output the canonical 384-dim vector directly.
- Generate candidate actions for MPC — sample around a real trajectory; the in-distribution prior keeps the rollout coherent.
Magnitude
Real teleop actions land in roughly [-1.5, +1.5]. The synthetic
example in dream.examples.dreamdojo_grasp() uses a 0.05-amplitude
sinusoid — well within distribution but visually boring (the rollout
shows a small, smooth wobble).
If your actions land far outside the trained distribution, the rollout quality degrades quickly — the model extrapolates, but the further out you go the more unphysical the result.
Action-dim mismatch
The most common debugging trap: action_dim isn't the same across embodiments.
# Wrong embodimentactions = np.zeros((48, 64), dtype=np.float32) # G1 has 64 dims, not 384model = client.models.get("dreamdojo-2b-gr1")model.predict(start_frame=img, actions=actions)# → dream.InputValidationError:# "actions action_dim mismatch: model expects 384, got array with shape (48, 64)"The SDK catches this client-side. Always check model.action_dim
before constructing the array.
Determinism
The rollout is conditioned on (seed, start_frame, actions). Same
three → bit-identical mp4 on the same hardware. Two different action
sequences with the same start_frame + seed produce two different
rollouts; that's exactly the property predict_batch exploits for
visual MPC.