跳到主要内容

3 篇博文 含有标签「community」

查看所有标签

Taking the Pulse of Agentic AI from the Developer Community at the End of Q1 2026

· 阅读需 14 分钟
Xia Xiaoya
Senior Researcher

Today, I want to share some observations on the Agentic AI ecosystem from the vantage point of 2026's first quarter—technical trends read from popular projects, portraits of AI developers, and the subtle relationship between developers and AI tools. This is not meant to be comprehensive; we welcome the community to share more observations and reflections.


Agentic Ecosystem in 2026

This year, everyone seems to be in a state where FOMO and excitement intertwine. There's a sense that AI application deployment has reached an unprecedented acceleration point—perhaps even a tipping point. But is this tipping point real or emotionally amplified? Let's calibrate our intuition with two metrics.

This chart shows the top 20 projects by OpenRank last month and the top 20 by Star growth this year—the most active and most-watched projects. I've highlighted LLM-related projects, and unsurprisingly, OpenClaw occupies the #1 and #2 spots on both lists.

Developer attention has completely flowed toward the Agent ecosystem, although the Star count list includes many awesome-collection type projects (which naturally attract more attention). Just looking at the project names, you can feel they're permutations of a few words: OpenClaw, Skills, Claude, Claude Skills, OpenClaw Skills. The actual developer effort reflected in activity metrics is somewhat more honest, but even so, LLM-related projects account for about 40%.

Expanding the scope to the top 1000 most-watched repositories, after rough labeling, we can see 81% are Agent-related. The most frequently tagged keywords in project Topics are: Agent, Claude, LLM, Code, Skill.

Looking back over the past few years, you can feel the rotation of technological ecosystem dominance from the naming of popular projects emerging at different stages. Popular projects created around 2023-2024 were mostly related to GPT and Llama, such as AutoGPT, MetaGPT, Ollama, llama.cpp. As time turns, there are always technologies that serve as unavoidable coordinates. In 2025, that coordinate was called Claude Code, and thus projects like Clawdbot (later OpenClaw) and Claude-Mem emerged.

Based on the currently most popular and active projects, we've compiled the latest map of the Agentic AI ecosystem, covering about 50+ projects. Many should look familiar, while some are new faces. Let's follow a few specific projects to examine current technical trends.


From Context Management to Complexity Harness

The optimizations we made under the capability constraints of the foundation models were essentially about managing information in the model's attention window: feeding more effective prompts to the model, invoking tools like browsers, connecting external background knowledge the model needs (RAG), and maintaining memory across multi-turn conversations. This path accumulated into a practice called "Context Engineering."

Claude-Mem and Context7 are two open-source tools created around mid-last-year, each now with tens of thousands of Stars. They each found interesting entry points, but essentially solve the same thing: telling the model more effective background knowledge—and making sure it doesn't forget.

Claude-Mem is a Claude Code plugin that compresses all conversation outputs during Claude Code's task execution using a model, providing them as context for future conversations to ensure the Coding Agent has longer conversation memory.

Context7 provides both MCP service and Skill loading modes. Every time a task is executed, it fetches the latest documentation of involved dependency libraries to ensure the Coding Agent doesn't execute outdated code.

But "Context Engineering" as a term is starting to feel insufficient this year, because the problem is no longer just "is there enough information," but "will the Agent lose control?" Developers have likely experienced this: during autonomous task execution, the Agent either crashes the entire system or stops halfway without saying anything.

Oh-My-OpenAgent (formerly oh-my-opencode, a plugin for OpenCode) calls itself the "strongest Agent Harness" in its project description. It built a continuous execution Enforcer called "Sisyphus": as long as TODO tasks aren't complete, it forces the Agent to keep restarting or finding new paths until 100% achievement—like Sisyphus endlessly pushing the stone up the mountain.

So I understand Harness as providing background knowledge while further constraining the Agent's behavioral boundaries—not just letting the Agent know "what is," but making clear "what it can touch" and "what it can't," and knowing what to do when stuck. Context Engineering manages input quality; Harness Engineering manages execution discipline.


Software Development Shifts from Human-Centric to Agent-Centric

This trend can already be felt from the projects above: these newly emerging tools are designed not to serve developers, but with the Agent as the execution subject. Interestingly, what humans have accumulated in software development practices is now migrating to Agents. Developers need to consult the latest documentation—so do Agents; developers need to collaborate in teams—Agents are starting to need that too.

Vibe-Kanban brings traditional task boards to the Agent team collaboration scenario, turning it into the Agent's command center. Each task creates an entry with clear acceptance criteria (AC) on the board. Agents execute against AC, while human engineers do task preview and Diff Review through an integrated UI. This is essentially a Harness too—just constraining not individual Agent execution behavior, but the entire development process.

A fitting analogy: model-driven code generation is a powerful but directionless horse; Harness is the equipment composed of constraints, guardrails, and feedback mechanisms; humans are riders, responsible for giving direction, not running themselves.


The Agent "Evolution" Proposition—Lobsters, Cats, and Bees

Agents are clearly no longer satisfied with fixed process orchestration—self-evolution is the new proposition. OpenClaw started the "raising lobsters" trend first, and soon a new batch of cats and lobsters appeared. These projects, inspired by OpenClaw, each made tradeoffs in different dimensions.

nanoclaw was launched in late January 2026 by indie developer Cohen, built entirely on Anthropic Claude Agent SDK with a core engine of about 4000 lines of code. Its design philosophy is security-first—all Agents run in isolated containers, using Apple Container on macOS and Docker on Linux, with Bash commands running in containers rather than on the host machine. Andrej Karpathy specifically mentioned it on social media: "The codebase is small enough that both I and AI can understand it, so it feels manageable, auditable, and flexible." This sentence precisely captures what this batch of lightweight frameworks is betting on: understandability itself is a security guarantee.

nanobot goes even more extreme. From HKU's Data Intelligence Lab (HKUDS), about 4000 lines of Python code—99% less than OpenClaw. It strips away all non-core modules, keeping only the ReAct reasoning loop, tool calling, and message queue. It even removed the litellm external dependency in subsequent versions, switching to native SDK for direct model connection—the shorter the supply chain, the smaller the risk.

CoPaw takes the opposite approach. Open-sourced by Alibaba Cloud's AgentScope team, it takes the feature-complete route. Built-in active heartbeat mechanism—not just passively responding to user messages, but proactively triggering tasks at set times. Memory is stored locally, with user preferences and historical tasks continuously accumulating. Supports DingTalk, Feishu, Discord, iMessage, and other channels, with a continuously expanding Skills ecosystem. If nanoclaw and nanobot are doing subtraction, CoPaw is seriously answering "what a complete personal AI assistant should look like."

Early this year, another open-source framework named Aden Hive appeared, answering a deeper question: Can the orchestration framework itself self-evolve?

The fundamental difference from traditional frameworks like LangChain and AutoGPT isn't in functionality, but in that it doesn't require developers to predefine agent execution flows. Its approach: describe goals in natural language, have a Coding Agent (Queen Bee) generate the Agent execution graph and connection code; once running, if failures occur, the framework captures failure data and calls the Coding Agent again to analyze causes, modify structure, and redeploy. This closed loop requires no human intervention. This is a serious bet on generative orchestration. It bets that task complexity often can't be predefined—rather than exhaustively enumerating all cases at design time, let the system continuously grow from feedback during real execution.

Whether Agents as personal assistants or Agent orchestration frameworks themselves, self-evolution is transitioning from a bonus feature to a design starting point.


Model "Big Three" Each Build Complete Ecosystem Tools

The top model companies are each laying out their open-source ecosystem tools and standards. MCP, Skills, Agents.md—one after another they land, and third-party tools can barely keep up digesting them.

An interesting phenomenon is the blurring boundary between Coding Agent and General Agent. After ChatGPT appeared, people searched for a long time before finding viable landing scenarios beyond Chatbot—Coding was among the first to be validated. But when tools like Claude Code reach a certain level, they naturally expand outward, not wanting to just be code-writing tools. OpenClaw was born under this expectation—using the IM window people are most familiar with as a carrier, attempting to carry more general Agent capabilities.


Project Story: One-Person Company? Zero-Person Company!

Just as the OPC (One Person Company) concept was being hotly discussed, a project called Paperclip that appeared in early March has pushed this further. The concept it's hyping: Zero-Person Company. In just over 20 days, Stars grew from 0 to 40,000.

Paperclip's positioning is very direct:

"If OpenClaw is an employee, Paperclip is the company."

Its usage logic has three steps: set goals, recruit a team, press start.

The goal could be "grow this AI note-taking app to $1M monthly revenue"; the team could be Claude as CEO, Cursor as CTO, Codex for engineering, OpenClaw for marketing; once started, this company begins running itself.

Even more interesting is its governance design. Agents can't hire new Agents themselves—this needs your approval; CEO can't unilaterally execute strategies—needs your confirmation. Paperclip positions you as the board—you can pause, override, reassign, or terminate any Agent at any time. Autonomy is a privilege you grant, not an Agent's default power.

In the OPC era, one person can do many things. But the question Paperclip is asking: if even that "one person's" execution work can be outsourced to Agents, what role remains for you? Probably just one word: Board.


The AI Era's "Developers and AI"

Having covered projects, let's look at the other side: the people behind these projects.

Developers: Concentrated in Head Projects, But from Diverse Backgrounds

In February 2026, across the top 50+ Agentic projects, there were approximately 21,000 independently active developers. But the “21,000” figure is somewhat misleading, because they are not evenly distributed across these projects: active developers in OpenClaw and Claude Code alone account for nearly half of the total.

Activity distribution is similarly highly concentrated. This is the familiar power law phenomenon in open-source communities, but it's particularly extreme in this ecosystem: top developer activity scores reach 81, while 95% of developers have activity under 1—a minority driving most substantive progress.

There are several noteworthy numbers in these developers' background composition. Among the 4,232 developers who filled in company information, those from big companies like FAANG and BAT account for less than 10%. More are independent developers and startup people—this ecosystem is not currently dominated by big company engineers.

Geographically, among the 6,295 developers who filled in country information, US developers account for 30%, and Chinese developers account for 10%.


Developers: Young and Cross-Disciplinary, "Builders," "Founders," and "Digital Nomads"

We focused on the top 100 most active developers. They're significantly younger, or at least arrived at the developer community later—the median account creation time is January 2018. If you include long-tail developers, the median becomes December 2013. These two numbers together tell us one thing: a significant portion of top active contributors entered the developer community after the Kubernetes era, and their technical intuition and infrastructure cognition differ noticeably from cloud-native veterans.

Even more extreme: among the 100, one-quarter (25 developers) registered GitHub after 2023, meaning they started coding only after LLMs truly went mainstream. ComfyUI author comfyanonymous and Aden Hive author RichardTang-Aden are among them. They're not developers "changed" by the AI wave—they're developers "summoned" by it. Before this, they might not have considered themselves developers at all.

Here are several representative developers. In their self-descriptions, they are designers, musicians, self-taught developers, prompt engineers, hackers, and digital nomads. Their commonality isn't technical background—it's that verb: build.


Developers and AI: Replacement or Symbiosis? Let's Look at the Numbers

This question is hard to answer directly, but numbers can provide clues. Searching GitHub for Claude-attributed Commits yields over 20 million results. Using the same search method: Cursor about 1 million, Copilot 700K, Gemini 450K, Codex even lower. The difference between Claude and others is a full order of magnitude.

Of course, this data has obvious limitations—this is fuzzy search, and many AI-participated Commits won't be attributed at all, and attribution habits vary by tool and team culture. But even discounting, this order-of-magnitude difference still tells us one thing: Claude-series tools are embedded quite deeply in actual code submission pipelines.

Beyond code generation, Review is another link being taken over by Agents. Copilot and CodeRabbit have completed hundreds of thousands of code Reviews in less than three months this year. The significance of this number isn't just scale, but that Review was previously considered highly dependent on human judgment—it requires understanding context, intent, and team norms. How well Agents can do this is still hard to determine, but they're already doing it.

Among all Agent landing scenarios, Coding is one of the few that has truly completed commercial validation. Other scenarios are still telling stories; Coding Agents are already collecting money.


2026 Coding Agent Landscape: Prompting, Generation, Review, to Requirements Management

We've compiled a landscape of currently popular Coding Agents. The code completion stage is basically past tense, but Copilot is still holding on. While it can't match Claude at writing code, as GitHub's native AI collaboration tool, it's still leading in code review.

Due to time constraints, we didn't do deeper research this time. There's an interesting question: do PRs using Review Agents get merged significantly faster than those without? Intuitively yes, but "significantly" to what extent, and in what types of projects is it most obvious—this deserves serious data analysis.

The more interesting part of the landscape is that some projects are already exploring earlier stages of the software development lifecycle—requirements management. Besides the aforementioned Vibe Kanban, Dane in the Mastra project is another fascinating bot. It can connect to various community channels—Slack, Discord, or mailing lists—extract or abstract project requirements from discussions, and directly file Issues in repositories.


Finally: Amidst AI FOMO, Openness, Sharing, and Collaboration Remain Developers' Spiritual Home

👆This sentence is a personal feeling written at the end.

Peter Steinberger is a tireless open-source builder and creator in the AI era. Before OpenClaw, he had already open-sourced 50+ projects. OpenClaw rekindled everyone's enthusiasm in this exhausted era, largely because it's an open-source project—not just spiritually, but because open-source means it can run locally, means data has some degree of privacy, means you can optimize or fork the project.

Under the AI FOMO wave, models iterate, products iterate, funding iterates. But openness, sharing, and collaboration have never truly gone out of style in the developer community. This is perhaps one of the few things in this ecosystem that doesn't need to wait for "the next version."

Agentic Learning

· 阅读需 4 分钟
inclusionAI
Ant Group

Introduction

Agent exhibits powerful capabilities by interacting with the external environment and making decisions based on the feedback it receives from the environment. For complex problems, it is often necessary for an agent to have multi-turn interactions with the environment to reach a solution. The complexity and dynamism of environments, coupled with the necessity for multi-turn interactions, pose numerous challenges in training agents.

We introduce AgenticLearning, an open-source agent training paradigm designed to empower researchers to train and evaluate autonomous agents effectively. AgenticLearning offers a framework for multi-turn interactions with the environment, enabling models to learn how to interact with the environment and make decisions based on its feedback, thereby enhancing the models' ability to leverage the environment to solve complex problems.

AdvancementsModelsToolsEnvironmentTraining Framework
RAG-R1Qwen2.5-7b-instructoffline retrieval
online search
AWorldLLaMA-Factory
verl
AReaL
FunReasonQwen2.5-7b-Coder-instructBFCLAWorldLLaMA-Factory
verl

News

[2025/07/01] 🔥🔥🔥RAG-R1 We propose RAG-R1, a deepsearch training framework that incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism.

[2025/05/16] 🔥🔥🔥FunReason We propose FunReason, a novel framework that enhances LLMs' function calling capabilities through an automated data refinement strategy and a Self-Refinement Multiscale Loss approach.

Advancements

Deepsearch

RAG-R1

  • Tools: Search Engines (offline or online)
  • LLM: Qwen2.5-7b-instruct

RAG-R1-framework

Overall framework of RAG-R1.

RAG-R1-result

Performance comparisons on QA benchmarks under the EM metric. The best and second best results are bold and underlined, respectively.

FunctionCall

FunReason

  • Tools: Real Human Function calling (BFCLv2 live&non-live)
  • LLM: Qwen2.5-7b-Coder-instruct

FunReason is a framework designed to enhance LLMs' function calling capabilities, achieving GPT-4o-comparable performance on BFCL, surpassing RL-based methods, mitigating catastrophic forgetting on HumanEval and MBPP, and using a data refinement strategy where natural CoT data outperforms artificial ones.

FunReason-Performance

Data refinement pipline of FunReason.

Overview of FunReason's data refinement pipeline. The pipeline consists of five stages: Function Call Classification, Query and Tool Identification, CoT Identification, Function and Parameter Identification, and Format Identification. Each stage ensures specific aspects of data quality, with failing examples either being discarded or regenerated.

FunReason-Performance

Performance of FunReason.

Citation

Please cite our repo if our works are helpful for your research.

@article{RAG-R1,
title={RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism},
author={Zhiwen Tan and Jiaming Huang and Qintong Wu and Hongxuan Zhang and Chenyi Zhuang and Jinjie Gu},
journal={arXiv preprint arXiv:2507.02962},
year={2025}
}

@article{FunReason,
title={FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement},
author={Bingguang Hao, Maolin Wang, Zengzhuang Xu, Cunyin Peng, Yicheng Chen, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang},
journal={arXiv preprint arXiv:2505.20192},
year={2025}
}

Contact

For any question or feedback, please reach out to us at ender.tzw@antgroup.com or chenyi.zcy@antgroup.com

License

This project is licensed under the MIT License - see the LICENSE file for details.

AReaL: Ant Reasoning Reinforcement Learning for LLMs

· 阅读需 11 分钟
inclusionAI
Ant Group

| Paper | Documentation | Ask DeepWiki | 🤗 Models & Data | WeChat Group |

AReaL (Ant Reasoning RL) is an open-source fully asynchronous reinforcement learning training system for large reasoning models developed at the RL Lab, Ant Research. Built upon the open-source project RealHF, we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).

AReaL Highlights

  • 🔥 [NEW] Asynchronous RL: With algorithm-system co-design, AReaL supports fully asynchronous RL for the fastest training! Experimental support for multi-turn agentic RL is also provided.
  • 🛠️ Open & Reproducible: We continuously release all code, datasets, and training recipes for RL training of LLMs.
  • 🚀 Scalability: AReaL can seamlessly adapt to different computational resource settings, ranging from a single node to 1K GPUs.
  • 🔪 Cutting-Edge Performance: AReaL can produce models with cutting-edge reasoning capabilities in math and coding. We are also actively working on agentic tasks.

News

[2025/06/03] (v0.3, boba²) We release boba² (double-boba) for fully asynchronous RL training, which achieves a 2.77x speedup while obtaining on-par or even better training performance compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out our v0.3 overview blog and the research paper.

[2025/03/31] (v0.2, boba) Here comes our next milestone release - boba! Please call it A-ReaL-boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our v0.2 technical blog.

[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our v0.1 technical blog.

Release Highlights

In our AReaL-boba² (A-ReaL-double-boba) release, we highlight the top 3 most important features:

  • A fully asynchronous RL training pipeline with system and RL algorithm co-design, achieving over 2.77x speedup without any performance drop. Check the benchmark scripts and instructions here.

  • SOTA coding models, i.e., a 14B model with a 69.1 score on LCB-v5. To reproduce, check the configs and instructions.

  • Experimental support for multi-turn agentic RL training. Check our complete example.

For the complete system design and more training details, please check our v0.3 blog and our research paper.

Jump to the quickstart section if you want to quickly run an experiment and get your hands dirty! 😈

Overview of Asynchronous RL Training

During the synchronous RL training process, a generation step must wait until the longest sequence completes within the batch of LLM outputs. Due to the varying output lengths for LRMs, a synchronous RL system suffers from massive GPU idle time, leading to training inefficiency. Some recent works (DeepCoder, Intellect) propose overlapping a single training step with a single generation step to accelerate training. However, the largest bottleneck remains unchanged: the samples within a batch are still from the same model version, leading to waiting and GPU idle time.

Synchronous vs One-step Overlap RL

Fig.1. Left: Execution timeline of synchronous RL training. Right: Execution timeline of one-step overlap RL system.

AReaL adopts a fully asynchronous RL training framework that completely decouples generation from training. In AReaL, LLM generation runs in a streaming manner, with each rollout worker continuously producing outputs without waiting. Meanwhile, trainer workers perform parallel model updates upon receiving training batches.

Asynchronous RL Training

Fig 2. Execution timeline of our fully asynchronous RL system.

AReaL follows a system-algorithm co-design principle: on the system side, AReaL efficiently syncs model parameters and carefully controls the staleness of each training sample; on the algorithm side, AReaL improves the objective of PPO to make async-RL stable.

We compare the scalability of asynchronous RL training based on our AReaL-boba² system with classical synchronous RL training (we adopt the fastest open-source system veRL, main branch on 05/07/2025) across different model sizes and different numbers of H800 GPUs. AReaL demonstrates much improved scaling capabilities with respect to training throughput. This is also partially due to AReaL decoupling training and generation, leading to much fewer GPU memory fragments.

Scaling Comparison

Fig.3 The scaling trend of asynchronous RL (based on AReaL-boba2) and classical synchronous RL (based on veRL) with different model sizes. Dotted lines indicate ideal linear scaling.

SOTA Code Generation Model by AReaL-boba²

We use Qwen3 as our base model. After asynchronous RL training, we achieve SOTA results on LiveCodeBench, Codeforces, and CodeContests benchmarks.

Model (8B)LiveCodeBench v5
(2024.10-2025.2)
CodeforcesCodeContests
Qwen3-8B58.81879/96.7%31.4
DeepSeek-R1-0528-Qwen3-8B58.41945/97.3%31.0
🤗 AReaL-boba²-8B-Open62.01933/97.2%41.4
🤗 AReaL-boba²-8B63.01962/97.5%40.8
Model (14B)LiveCodeBench v5
(2024.10-2025.2)
CodeforcesCodeContests
Qwen3-14B65.41978/97.7%38.3
DeepCoder-14B-Preview60.61936/95.3%40.1
🤗 AReaL-boba²-14B-Open67.31990/97.8%46.2
🤗 AReal-boba²-14B69.12044/98.2%46.1
Larger ModelsLiveCodeBench v5
(2024.10-2025.2)
CodeforcesCodeContests
Qwen3-235B70.72056-
DeepSeek-R164.32029-
OpenAI-o3-mini (Medium)66.32036-

Table 1: Coding Task Performance Comparison. AReaL-boba²-8B/14B-Open denotes training results on open-source data. AReaL-boba²-8B/14B models are trained with an additional small amount of internal data and achieve SOTA performance on LiveCodeBench, Codeforces & CodeContests.

We highlight the tutorials and code walkthroughs about the following key features for asynchronous training:

RL Training for Multi-turn Agent

AReaL-boba² allows you to independently customize the dataset, rollout behavior, and the training algorithm, without needing to modify the heavy system-level code.

In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the step-by-step guide if you want to implement your own agentic RL project.

Getting Started

Obtain the training data:

For code training data, a simple preprocessing script was provided in examples/data_preprocess/preprocess_training_data.py:

python3 preprocess_training_data.py --data_path $original_data_path --output_path $training_data_path

Train Qwen3 1.7B locally (Remember to modify dataset.path in the script below):

bash examples/run_async_ppo.sh

Evaluation:

cd evaluation
# Evaluate the model
python eval_and_aggregate.py \
--model_path ${MODEL_PATH} \
--output_path ${OUTPUT_PATH} \
--data_names aime24,aime25 \
--max_gen_tokens 32768 \
--data_names codeforces,lcb_v5 \
--prompt_type qwen3-think-pure \
--temperature 1.0

Resources

Quickstart

Benchmark and Reproduction

Customization Guide

System Code Walkthrough

Future Plan

AReaL is under active development. We plan to have minor releases weekly and major releases monthly. Community engagement and contributions are extremely welcome. We are also hiring interns and full-time employees with open positions in both the US and China.

For the research and development plan already in place, please see the following list:

System Development

  • Support for SGLang
  • RL training with coding problems
  • Asynchronous generation and RL training
  • Optimizations for distributed training: expert parallel for MOE and zero-bubble pipelining
  • RL for vision-language models (VLM)
  • Multi-turn agentic RL
  • Function calling and tool use

Algorithm Development

  • RL training recipes for 1.5B and 7B models
  • A complete RL training recipe for 32B models
  • Sample-efficient multi-task RL algorithms
  • Agentic capabilities with end-to-end RL
  • Stable RL training for larger MOE models

Acknowledgement

We would like to note that major contributors are from the RL Lab at Ant Research and the Institute for Interdisciplinary Information Sciences, Tsinghua University.

Our team has also received invaluable assistance from the Data Intelligence Lab at Ant Research for data support and from the Super Computing Technology (SCT) team at Ant Group, particularly in the realm of large-scale cluster operations and maintenance.

We also appreciate all the pioneering works from the community, particularly the ReaLHF project from OpenPsi Inc. and other projects, including but not limited to DeepScaleR, Open-Reasoner-Zero, OpenRLHF, VeRL, SGLang, QwQ, Light-R1 and DAPO.

Citation

@inproceedings{mei2025real,
author = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
title = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},
booktitle = {Proceedings of the Eighth Conference on Machine Learning and Systems,
MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},
publisher = {mlsys.org},
year = {2025},
}
@misc{fu2025areal,
title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},
author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},
year={2025},
eprint={2505.24298},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.24298},
}