LoRA Reference

LoRA Reference#

LoRA is a parameter-efficient fine-tuning technique that injects trainable low-rank matrices into pre-trained weights, typically around linear layers. Compared with full-parameter fine-tuning, this reduces memory usage and compute cost substantially, making RL fine-tuning of large models much more practical on limited hardware.

In AReaL, this is especially useful for:

reinforcement learning with very large models, including 70B+ models, on relatively modest hardware such as 8 x 80 GB GPUs,
enabling larger batch sizes because LoRA reduces training memory pressure,
simplifying transfer and deployment because only the LoRA adapters need to be saved and shipped,
[Future] fine-tune multiple LoRA adapters more efficiently in parallel for better hardware utilization (see RFC #609).

This guide explains how to enable LoRA in RL training and configure the related parameters.

Backend Support#

The current LoRA support matrix in AReaL is:

Engine	vLLM	SGLang
FSDP2	✅	✅
Megatron	✅	❌
Archon	❌	❌

Example scripts:

Engine	Example script
FSDP2	`examples/math/gsm8k_grpo_lora.yaml`
Megatron	`examples/math/gsm8k_grpo_megatron_lora.yaml`
Megatron MoE	`examples/math/gsm8k_grpo_megatron_lora_moe.yaml`

For Megatron + vLLM, AReaL now supports:

LoRA fine-tuning on MoE architectures such as Qwen3 MoE with XCCL-based LoRA weight.
Cross-node LoRA training when the Megatron and rollout groups span multiple nodes.

Core LoRA Parameters#

Parameter	What it controls	Typical values
`use_lora`	Enables LoRA fine-tuning mode.	`true` / `false`
`lora_rank` (`r`)	Rank of the low-rank adapters. Higher rank increases capacity and memory/compute cost.	`8`, `16`, `32`, `64`
`lora_alpha`	LoRA scaling factor. Effective adapter scale is commonly thought of as proportional to `alpha / r`.	`16`, `32`, `64`
`target_modules`	Which model submodules receive LoRA adapters. This is the most important architecture-specific setting.	e.g. [`all-linear`]
`peft_type`	PEFT method type. In AReaL configs, this is LoRA.	`lora`

Practical Notes#

Start with r=16 or r=32 for most models, then tune upward only if needed.
Keep target_modules consistent with your model architecture naming.
For Megatron backend, LoRA requires megatron-bridge instead of mbridge.

LoRA Reference

Contents

LoRA Reference#

Backend Support#

Core LoRA Parameters#

Practical Notes#