Allocation Mode#
This document describes AReaL’s allocation mode system, which controls how GPUs are distributed between inference and training backends during distributed RL training.
Overview#
Each engine component (actor, critic, rollout, ref, teacher) has its own backend
configuration field that specifies:
Which backend to use (SGLang, vLLM for inference; FSDP, Megatron, Archon for training)
The parallelization strategy
The total number of GPUs required
AReaL parses each backend string into a ModelAllocation object that drives resource
allocation for that specific engine.
Configuration#
Per-Engine Backend Fields#
Each engine in the YAML config has its own backend field:
# Rollout (inference) engine
rollout:
backend: "sglang:d4t2"
# Actor (training) engine
actor:
backend: "fsdp:d8"
# Critic engine (falls back to actor.backend if empty)
critic:
backend: ""
# Ref engine (falls back to actor.backend if empty)
ref:
backend: ""
When critic.backend or ref.backend is empty, it automatically inherits from
actor.backend.
Note: The top-level
allocation_modeconfig field is deprecated and only retained for backward compatibility with legacy SPMD launchers (local/ray/slurm). It is ignored by the single-controller scheduler. Use the per-enginebackendfields shown above instead.
Backend String Syntax#
<backend>:<parallelism_dims>
For example, fsdp:d4t2 means: use the FSDP backend with data parallelism 4 and tensor
parallelism 2.
Parallelism Dimensions#
Dimension |
Abbreviation |
Description |
Valid For |
|---|---|---|---|
Data |
|
Number of model replicas |
All backends |
Tensor |
|
Split operations across GPUs |
All backends |
Pipeline |
|
Split layers across GPUs in stages |
Megatron, Archon |
Context |
|
Split sequence length across GPUs |
All backends |
Expert |
|
Split MoE experts across GPUs |
Megatron, Archon |
Dimensions are specified as <abbrev><size>, e.g., d4t2 means data parallel size 4
and tensor parallel size 2.
Calculating GPU Requirements#
The total GPUs for a component is computed as:
world_size = dp × tp × pp × cp
Expert parallelism (e) does not increase world size—it redistributes how experts are
placed within the existing GPU mesh.
Examples#
Backend String |
GPUs per Engine |
Notes |
|---|---|---|
|
8 |
8 data-parallel replicas |
|
8 |
2 instances × 4 TP GPUs |
|
16 |
2 DP × 2 PP × 4 TP |
|
16 |
Same mesh, 4-way expert par |
Full Config Example#
# 16-GPU setup: 8 inference + 8 training
rollout:
backend: "sglang:d2t4" # 2 × 4 = 8 GPUs
actor:
backend: "fsdp:d4t2" # 4 × 2 = 8 GPUs
Backend Selection#
Inference Backends#
Backend |
Supported Dimensions |
|---|---|
|
|
|
|
For inference, d represents the number of independent server instances, and each
instance uses t × p GPUs.
Note that the internal backend configurations do not affect how AReaL allocates GPUs.
Given rollout.backend: "sglang:d4t4", you can also configure sglang.dp_size=4,
sglang.ep_size=4, and sglang.enable_dp_attention=True. In this case, we launch 4
model replicas each with 4 GPUs. Within each instance, SGLang will still use DP
attention and expert parallelism to distribute computations in attention and expert
layers.
Training Backends#
Backend |
Supported Dimensions |
Use Case |
|---|---|---|
|
|
Default for simple parallelism |
|
|
Required for pipeline or expert parallel |
|
|
Alternative to Megatron (experimental) |
Important: An explicit backend prefix is required in all allocation strings. Bare dimension strings (e.g.,
d4t2) are no longer accepted. Always specify the backend explicitly:fsdp:d4t2,megatron:d2p2t4,sglang:d4t2.
MoE Hybrid Parallelism#
For Mixture-of-Experts models, Megatron/Archon supports different parallelism strategies for attention and FFN (expert) modules using the hybrid syntax:
megatron:(attn:<attn_dims>|ffn:<ffn_dims>)
This enables MoE Parallel Folding, which reduces the minimum GPU requirement for combined context and expert parallelism.
Constraints#
Pipeline parallel size (
p) must be identical forattnandffnWorld size must match (if
dis omitted inffn, it is derived automatically)Expert parallel (
e) is only valid in theffnsection
Example#
actor:
backend: "megatron:(attn:d4p2t2c2|ffn:d2p2t4e2)"
Module |
dp |
pp |
tp |
cp |
ep |
World Size |
|---|---|---|---|---|---|---|
attn |
4 |
2 |
2 |
2 |
- |
32 |
ffn |
2 |
2 |
4 |
- |
2 |
32 |
See Also#
Fine-tuning Large MoE Models - Tutorial for Megatron backend
Archon: PyTorch-Native Training Engine - Tutorial for Archon backend