12 posts tagged with "insights"

Agentic AI 2026: When the Hackathon Fever Cools Down

June 1, 2026 · 32 min read

Senior Researcher

After the party cools down, we still want an inclusive AGI that more people can use.

Opening

Over the past year, we often described the LLM developer ecosystem as a “hackathon in the real world.” The phrase fits. It has energy, speed, luck, and flashes of talent. It also has noise, repeated work, short-lived projects, and repos that become famous overnight only to be covered by the next wave a few days later.

By mid-2026, the feeling is different. The change is no longer just “a few more Agent projects.” Something deeper is shifting: the way software is made is starting to move.

In the past, people used software. Software was designed around human hands, eyes, and attention: buttons, forms, editors, and chat boxes. Now agents are becoming a new kind of software user. They read files, call APIs, run commands, open PRs, write tests, review code, and wait for human approval before moving on. They do not always sit inside a chat box. They do not only answer questions. They are entering the inner workflow of software.

So the most useful question is not whether Agentic AI has a bubble. Of course it does, and it will have more. The better questions are: when the hackathon fever cools down, where will software go? What will developers become? What role is left for open source? And why do we need an inclusive AGI future?

Signals From Platforms

Models have not made software smaller. They have widened its boundary.

If we only watch product launches, it is easy to think that models are the whole story of AI. But when we look at GitHub and Hugging Face together, a different picture appears.

GitHub tells us what developers are building. Octoverse 2025 shows that GitHub has more than 180 million developers. In 2025, it added more than 36 million new developers, about one new developer every second. From January to April 2026, OpenDigger events recorded 13.016 million active developers and 27.107 million active repositories. Software production has not shrunk because models got stronger. It is still expanding.

The more interesting signal is automated accounts. In the first four months of 2017, only 112 bot or app actors were active in the GitHub event stream. In the same period in 2026, the number reached 17,285. That is 154 times larger across the ten-year window. The first four months of 2026 alone already doubled the same period in 2025. Today, open-source collaboration can no longer be imagined as “human developers working on GitHub, with a few CI bots doing chores on the side.” Automated accounts are entering the software production chain. They are becoming part of the collaboration network.

Hugging Face gives another signal. It shows how models are published, downloaded, changed, and reproduced. The number of public models has reached 2 million. It grew by more than 100% in the past year, and more than 540,000 models were added by May 2026 alone. A model platform is no longer just a display case for research models. It looks more like a busy factory. Some people publish foundation models. Some fine-tune them. Some quantize them. Some convert formats. Some upload adapters. Some move the same capability to different devices and runtime environments.

The GitHub collaboration list and star-growth list tell a similar story. AI projects have entered the core engineering world. Developer attention and curiosity are strongly focused on repos around new words like Agent, Claude Code, and Skills. But real collaboration still happens around the long-term foundations and complex systems of software. In other words, models have not eaten software. They are rewriting the division of labor inside software.

Models handle understanding, generation, reasoning, and tool use. Software puts models into reliable workflows. It manages data, permissions, state, cost, audit, and delivery. The closer models get to real work, the more software is needed to define boundaries, save process, connect systems, and handle failure. Models have not made software smaller. They have widened its boundary.

The Agentic AI Ecosystem Architecture

The ecosystem has moved from an LLM toolchain to a full execution stack for Agentic AI.

When we built the LLM developer landscape last year, the main question was still: which projects should be on the map? At that time, the ecosystem was slowly forming layers around LLM SDKs, RAG, agent frameworks, application platforms, and inference infrastructure. People cared about how to connect models, build RAG, write an agent, and run inference services.

By 2026, this question became harder. There are too many projects. They change too fast. Even many older projects are redefining themselves.

OpenClaw launched in November 2025 and passed 200,000 stars in February 2026. It took only 84 days. React, the software foundation that shaped modern frontend development, took almost ten years to reach the same number. This does not mean OpenClaw already has React’s long-term engineering impact. It means GitHub’s attention system, the speed of spread in the AI era, and developer expectations for agentic software have all changed.

Projects appear too fast, and attention moves too fast. A static map can easily capture only one moment. So this time we separated two jobs. The landscape map makes judgments and selects a set of representative projects worth watching. The dynamic leaderboard checks the temperature. It tracks Agentic AI projects that developers have actually worked on, and shows which projects suddenly became hot, which ones kept their heat, and which ones started to show real collaboration.

This dynamic leaderboard, built with OpenDigger, is now live on the inclusionAI website: https://www.inclusion-ai.org/insight

The latest monthly OpenRank Top 10 looks like a cross-section of the ecosystem. Claude Code, Codex, OpenCode, and Gemini CLI sit near the task-entry layer. vLLM, SGLang, TensorRT-LLM, and PyTorch sit near the infrastructure layer. Entry projects are closer to developers and users, so issues and PRs are busier. Infrastructure projects may have less scattered participation, but the collaboration density is high. What is heating up is not a single entry point. It is the whole execution system.

The structural change is clear. In 2025, the landscape was still sorting SDKs, RAG, ChatUI, and MLOps. In 2026, leading projects are being rearranged around the task execution system of agents.

The landscape from last May still felt very much like an LLM toolchain. We were mostly looking at which SDK to use for LLM apps, which RAG and vector database to connect data, and which inference framework to run models. By May 2026, the focus had moved from “how to write an agent” to “how agents enter task execution.”

This makes the three-layer architecture of Agentic AI in 2026 easier to read. At the top are human-agent collaboration and task entry. In the middle are token supply and scheduling. At the bottom are the model capabilities themselves.

Agent Infra decides who AI can work for. It includes the entry points users meet directly, such as coding agents. It also includes agent runtime infrastructure, such as context, tool use, sandbox, and AgentOps. It also includes agent builders, orchestrators, and operators.

Model Infra decides whether this can scale. It includes data, training, inference serving, deployment, scheduling, and operations. If an agent only answers one sentence, the lower-level cost can be ignored. But once it reads code, searches information, calls tools, waits for feedback, and keeps going, tokens stop being chat consumption. They become production material.

Models decide where the capability boundary is. The model layer still matters. But it can no longer be understood only by asking who has the higher benchmark. Frontier and foundation models explore the upper bound. Small edge models solve low-cost, low-latency, and privacy-heavy cases. Specialized models adapt to code, finance, and services. Real tasks will look more like model portfolios than one model ruling everything.

These three layers are not a simple supply chain. They push one another forward. When agents try to do longer tasks, Model Infra is forced to lower inference cost, increase context throughput, and improve observability. When Model Infra improves, small models and specialized models become easier to use in production. When the model layer improves in reasoning, tool use, and multimodality, agents are pushed further out of the chat box.

Agent Infra: Redefining How Software Is Used

Agent Infra is the busiest layer in this dataset, and also the easiest to misread. On the surface, products like OpenClaw, Claude Code, and Codex are fighting for attention. Deeper down, this layer is redefining how software is used. Agent Infra is bringing silicon-based executors into the software world.

Coding agents are the first real large-scale entry point for Agentic AI, because code is a natural place for agents to work. It has files, tests, logs, version control, diffs, PRs, reviews, and rollback. Almost all the feedback loops machines need already exist in the code world.

In the past, software assumed that its user was human. We designed UI for people. We wrote docs for people. We gave people buttons and forms. In the agent era, software has a new user. This user may not need a beautiful interface. It needs stable APIs, tool protocols, permission boundaries, readable state, executable commands, verifiable results, and rollback.

An agent without context is only briefly awake. Without tools, it can only give advice. Without permission control, it cannot enter real systems. Without sandbox and rollback, companies will not let it act. Without observability and evaluation, humans cannot know why it failed.

Software is being repackaged as a runtime environment that agents can enter. APIs, MCP, tool protocols, context, sandbox, and observability are becoming new basic parts.

We gave the descriptions and READMEs of 226 projects to a model and built a multi-label classification. The result is telling. Coding Agent has 78 projects and 14,019 participants. MCP has 59 projects and 6,651 participants. Memory has 70 projects and 7,609 participants. Observability has 71 projects and 5,463 participants. Gateway has 31 projects and 2,637 participants. These labels overlap, so we should not add them up. But the horizontal view is enough: attention is moving from “building a product that can chat or write code” to putting context, tools, gateways, sandboxes, and observability into a runtime.

Model Infra: The Main Job Is Supplying Tokens at Scale

The inference layer is splitting into several jobs. High-throughput engines such as vLLM and SGLang run models faster, more reliably, and more cheaply. Edge and local inference projects such as llama.cpp bring models to personal computers, private environments, and edge devices. Data-center schedulers such as Dynamo and Ray Serve handle multi-model, multi-tenant, multi-GPU, and multi-region operation. Gateways and proxies such as LiteLLM and OpenRouter handle model routing, fallback, unified interfaces, cost tracking, and audit.

Post-training is another key part of this layer. Projects like AReaL and Slime show the rise of reasoning training and Agentic RL. In the agent era, RL is not only about making models answer better. It is also about making them better at using tools, following constraints, keeping state in long tasks, and knowing when to stop and ask a human.

The future cost advantage of AI will not come only from cheaper models. It will also come from better token supply-chain management. The value of Model Infra is to orchestrate these capabilities like electricity, logistics, and databases. Whoever can make tokens stable, cheap, observable, and governable will own a real production infrastructure in the agent era.

This is similar to the early evolution of cloud computing. At first, people cared about whether they could run services at all. Later, the real competition became scheduling, elasticity, observability, cost, SLA, supply chain, and developer experience. Model Infra is moving along the same path.

Harder evidence comes from the community itself. In mid-May 2026, leading serving projects still had a high build tempo. Release notes, roadmap issues, and bug reports kept repeating words like PD disaggregation, KV cache, router, scaling, fault tolerance, and health check.

SGLang issue #21846 names its latest roadmap “Distributed KVCache System For Agentic Workload.” It says clearly that agentic workloads are driving fast growth in KV cache storage and transfer volumes, and that the current PD disaggregation and HiCache designs are hitting limits. Agents have changed the consumption structure of tokens.

Dynamo issue #5506, the H1 2026 roadmap, focuses on request scheduling, KV cache reuse, worker scaling, and service availability in Kubernetes and multi-node environments. The serving battle is expanding from a single inference engine to an inference system.

Another issue, SGLang issue #20252, records a very real large-scale deployment failure: qwen3-32b-fp8, with 90 prefill and 30 decode workers, running on an H20 cluster. After some prefill nodes restarted or migrated, decode kept retrying, health checks failed, the router removed workers, traffic moved to the remaining nodes, and under high QPS the system ended in cascading failures and 503s. The lesson is simple: running the model is only the first step. The harder industrial problem is whether a single node’s instability will be amplified by routing, health checks, and traffic shifting into a global failure. In production, the real pain is stability.

Model: There Is No Single Winner

The model layer is still the source of capability for the whole ecosystem. But it can no longer be understood by asking only who has more parameters or a higher benchmark.

Hugging Face derivative and download data reminds us that a model’s life does not stop at release. Model families such as Qwen and Gemma matter not only because the models are strong, but also because people fine-tune them, quantize them, convert them, distill them, and move them to edge devices and application scenarios. Models are starting to have downstream ecosystems like open-source software packages: people fork them, patch them, make lightweight versions, build domain versions, and create compatibility layers.

Hugging Face shows signals of model release, download, and reproduction.

OpenRouter’s token usage leaderboard breaks the story of a single champion model. Coding may use one model. Long-context research may use another. Low-cost batch processing may use another. Voice, image, and video may use others. Local privacy scenarios may need a different stack again. Real usage is unlikely to settle on one permanent winner. Users route across price, speed, context length, tool use, coding ability, and free quotas.

Real token usage shows the reality of many models, many providers, and many routes.

This view is already supported by research and engineering practice. RouteLLM, from ICLR 2025, runs a clear comparison: first judge the difficulty of the request, then decide whether to use a strong model or a cheaper model, instead of sending every request to the most expensive model. On some benchmarks, this routing approach keeps quality close to the strong model while cutting cost to less than half. IPR, from EMNLP Industry 2025, tests this idea in a large cloud platform deployment. It routes prompts across Claude models and reaches the quality of the strongest Claude model while reducing cost by 43.9%.

The open-source community is moving in the same direction. A GitHub Search API check in LiteLLM, one of the hottest model API routing projects, shows that routing terms related to cost, budget, and spend governance appear often in issues and PRs: 23 results for cost based routing, 18 for lowest cost, 37 for budget routing, and 76 for spend tracking as of May 22, 2026. Engineering teams are already asking: how do we route requests to models that are more suitable, cheaper, and easier to govern, while still keeping quality and stability?

Trends and Project Stories

Technical shifts also show up in how projects describe themselves.

Description Signals: Projects Are Rewriting Their Self-Introductions

Classifications and leaderboards reveal structure, but they can still feel abstract. Many more interesting changes are hidden in a project’s own description, and in the repeated “what we are not” lines in READMEs.

After aligning historical landscape snapshots with current project data, we found that 96 out of 226 projects had changed their descriptions. The most visible new words include agent, harness, context, workflow, and MCP. These wording changes show projects looking for a new place in the ecosystem.

These changes move toward the agent execution stack along several common paths.

One common path is Workflow Builder → Agent Orchestrator. Projects such as Dify, Flowise, Langflow, and Activepieces used to answer the question “how do I build an LLM app or automation workflow?” Now they increasingly talk about agentic orchestration. Deer-flow is a sharp example. It used to describe itself as a community-driven Deep Research framework, with web search, crawling, and Python execution. Now it calls itself a long-horizon SuperAgent harness that can research, code, and create. Deep Research is moving from “help me look things up” to “execute long-running tasks.”
Another path is RAG / Data / Vector DB → Context / Memory Infra. RAGFlow, Chroma, DataChain, and Letta show that RAG and data are growing beyond “adding knowledge to models.” They are becoming long-term context, memory layers, and searchable workbenches for agents. DataChain moves from ETL, analytics, and versioning to a context layer for unstructured data. Chroma moves from an embedding database to search infrastructure for AI. Letta moves from memory services to a platform for stateful agents. These are all part of the same hidden line.
A third path is closer to the user entry point: Chatbot / AI Client → Agent Workspace / Personal Assistant. lobehub/lobehub is worth a closer look. Its old description called it Lobe Chat, a modern-design AI chat framework. Now it says users can find, build, and collaborate with agent teammates. Chatbot projects like LobeChat are actively escaping the name and imagination of the “chat box.” It is not just saying “we are also agents.” It is rewriting the product from Chat to Hub: from human-model conversation to humans living, working, and dividing tasks with agent teammates.
A path closer to developer tools is Dev Tool / IDE / Terminal → Agentic Dev Environment. Warp, Daytona, Coder, Continue, and Cline are turning developer tools into work environments for agents. Warp moves from an AI-powered terminal to an agentic development environment. Daytona moves from a dev environment manager to secure and elastic infrastructure for running AI-generated code. Continue moves from an AI code assistant to source-controlled AI checks and quality gates in CI. The shift is important: the software entry point is moving from “a person opens an IDE and writes code” to “a person defines the goal, and an agent executes in a controlled environment.”
At the framework layer, many projects are moving from Framework → Agent Harness. LangChain, deepagents, Mastra, Agno, and Hive are no longer satisfied with calling themselves a framework or SDK. They are moving toward platform, harness, and production AI. harness is a meaningful word in this shift. Among the 96 projects that changed descriptions, six now contain harness, and four of those added it later, including deer-flow, LobeHub, and Hive.
At the tool and gateway layer, the shift is Tool Integration → Agent Control Plane. Projects such as Composio, LiteLLM, and OpenSandbox push tool use beyond “function calling” or “API wrapper” toward something closer to a control plane. ComposioHQ/composio used to emphasize integrations and function calling. Now it says it powers 1,000+ toolkits, tool search, context management, authentication, and sandbox. It puts several key words of Agent Infra into one sentence.
At the model infrastructure layer, we see RL / Inference / Training → Agent Workload Infra. AReaL, verl, SGLang, and GPustack show that Agentic AI is rewriting not only the application layer, but also Model Infra. areal-project/AReaL used to call itself a Distributed RL System for LLM Reasoning. Now it is “The RL Bridge for LLM-based Agent Applications.” Post-training is being redefined by agent tasks. It is not enough for a model to “answer correctly.” It also needs to get things done across tool use, long tasks, environment feedback, and multi-step decisions.

This kind of change can be misread as “everyone is just chasing the agent trend.” Some projects are doing that, of course. Every fast-growing ecosystem has this noise. But when a mature project changes its description, it often means the project has felt a real change in user demand.

Agentic AI is pulling together projects that used to sit separately in applications, data, developer tools, MLOps, cloud native, and models. RAG projects talk about context. Data governance projects talk about semantic layers that agents can use. Development environments talk about developers and their agents. Gateway projects talk about model routing and cost control. Every project is asking: if my users are not only humans, but also agents, what should I provide?

README Evidence: Projects Say What They Are Not

Negative statements in READMEs give another kind of evidence. When a project keeps saying “not a...”, it is often not just explaining features. It is trying to escape the labels of the previous generation.

OpenFang calls itself an Agent Operating System. It also clearly rejects labels like chatbot framework, Python wrapper around an LLM, and multi-agent orchestrator. The message is direct: chat windows, thin SDKs, and simple orchestration are no longer enough for the position it wants. It wants to sit at the OS or runtime layer.

Paperclip is similar. It rejects chatbot, agent framework, workflow builder, prompt manager, single-agent tool, and code review tool. Instead, it says it wants to run a zero-person company made of agents.

Behind these negative statements is a set of labels that are losing energy: chatbot framework, LLM wrapper, workflow builder, prompt manager.

Users are no longer satisfied with a nice chat page, a model API relay, or a demo workflow. They are asking more concrete questions: How does an agent connect to real software? How are permissions managed? How are context and memory maintained over time? How are failures observed? How does code execution enter a sandbox? How is model cost controlled?

This is the line between Agentic AI as a toy and Agentic AI in production.

Developers and AI Tools

The stronger AI tools become, the heavier human responsibility becomes.

Who Is Taking Part in the Agentic AI Ecosystem?

Before asking whether models will swallow software and open source, we should first look at how developers themselves are changing.

We built a developer profile from 226 Agentic AI projects. We counted actors who participated in these projects from January 1 to before May 1, 2026, keeping bot and app accounts. This gave us 563,973 developers or automated accounts. We then ranked the Top 10,000 by their April 2026 community_openrank contribution in these projects and added GitHub profile, company, and location. About 2.0% are likely bot or app actors, or 198 automated accounts.

Among these 10,000 high-contribution participants, 2,920 have identifiable company fields and 3,575 have standardized country fields. In self-reported companies, NVIDIA, Microsoft / GitHub, Intel, Google / DeepMind, and Red Hat rank high. In standardized countries, the United States has 1,113, China has 726, and India has 229. This is not a single group of “open-source hobbyists.” It is a network made of model companies, cloud vendors, chip companies, startups, university labs, independent developers, and automated accounts.

Figure 11: The participant structure includes model companies, cloud vendors, chip companies, startups, open-source maintainers, independent developers, and automated accounts.

There are three interesting signals here.

First, the contribution center of Agentic AI does not sit only in “application startups.” Agent or model-native projects like openclaw are visible, of course. But apache and pytorch also appear near the top. The production network of Agentic AI has crossed the application layer. Some people build coding agents. Some build models and inference. Some work on data, workflows, and engineering infrastructure.

Second, self-reported company fields show that chip vendors, cloud and model companies, and open-source infrastructure companies are all present. Low-level compute and inference stacks are being pulled forward by agent demand. Large companies are entering toolchains and developer workflows through open-source projects. Infrastructure companies such as Red Hat and Databricks show that enterprise engineering and data platforms are joining the Agentic story.

Third, the geography has not collapsed into a single Silicon Valley story. The United States is still the largest node, but China, India, Germany, and Canada together form a production network across time zones. Agentic AI looks like a global engineering site. A model may be released in one place, inference optimized in another, and coding-agent entry points built in a third. Then maintainers, bots, and CI systems from different countries push changes into the main branch.

Automation is climbing up the software production chain step by step: first running tests, sending reports, and updating dependencies; then reading code, making suggestions, generating patches, and taking part in collaboration.

There are two kinds of bots here. The first is traditional automation: github-actions[bot], dependabot[bot], codecov[bot], copybara-service[bot], and pytorchmergebot. They keep large engineering projects moving. The second is a new generation of AI tools: greptile-apps[bot], coderabbitai[bot], gemini-code-assist[bot], and chatgpt-codex-connector[bot]. They have moved beyond fixed scripts. They read code, understand context, comment on changes, and join reviews.

Figure 12: Bots are no longer just CI noise. Some already handle review, code understanding, automated fixes, and agent connector work.

Carbon-Based Definers and Silicon-Based Executors

The profile of real human developers is much richer than “programmers using AI to write code.” Top developers include independent tool builders, open-source maintainers, AI startup founders, engineers from cloud and model companies, researchers, tool authors, and community project maintainers. Some build agents. Some maintain inference and scheduling infrastructure. Some connect models to workflows. Some write rules, maintain communities, and handle issues and reviews.

Among the 226 Agentic AI projects we track, 78 are related to coding agents. They have 3.86 million stars in total, and 14,019 participants in April 2026. The CLI-first path, represented by Claude Code and Codex, enters local repos, shells, tests, and git directly. The IDE-first path, represented by Cursor, keeps the developer’s mental model and lets people step in at any time. Devin, OpenHands, and Multica represent cowork / cloud worker systems that try to move tasks from issue to PR in the background. Harness tools around memory and team orchestration try to give agents a long-term work environment.

Leading projects are also using coding agents heavily. We scanned the file trees of the OpenRank Top 100 Agentic AI projects and found that 92 projects had at least one coding-agent-related configuration. On average, each project used 2.8 kinds. Claude Code had the highest coverage, at 81%. OpenAI Codex reached 69%.

There is also a small but telling detail. In google/adk-python, the project with the most agent markers, the only agent config directory that remains is .gemini. But .gitignore still contains traces of Codex, Claude, Cursor, Windsurf, Aider, Cline, Continue, and other tools. Files like AGENTS.md, CLAUDE.md, and .cursor/rules are like cheat sheets for AI. In the past, much project knowledge lived in maintainers’ heads: why this module should not be touched, which test is slowest, what to check before release, which dependency versions are tricky, and which changes require talking to someone first. In the agent era, if this hidden knowledge is not written down, agents cannot execute it reliably.

This is symbolic. Open-source projects used to write CONTRIBUTING.md for human contributors. Now projects are starting to write onboarding documents for agent contributors. Open-source collaboration is no longer only an agreement among people. It is becoming a work protocol shared by humans and AI.

Software development is becoming “carbon-based people define tasks, silicon-based systems execute them.” When people say developers are being replaced, what often happens is that developers are moving from the execution layer to the definition layer. The old core skill was translating requirements into code. The new core skill is translating fuzzy goals into task systems that agents can execute, verify, and roll back. This changes the daily feel of development. Writing code used to feel like laying bricks. More and more, the work feels like guiding an unstable but fast-learning colleague: explain the task, give enough context, mark the areas it must not touch, let it propose a plan, and then review the diff. Good developers will care more about things beyond prompts: whether the repo structure is clear, whether tests are reliable, whether error messages are readable, whether docs tell agents how to run things, and whether review standards can be understood by machines.

This creates a change that sounds contradictory but makes sense: the stronger AI tools become, the heavier human responsibility becomes. When a tool only completes one line of code, a person only needs to judge that line. When a tool can change dozens of files, run tests, open PRs, and respond to review, humans must design boundaries, write acceptance criteria, manage permissions, inspect results, handle failure, and take responsibility for the final merge. Developer value has not disappeared. Its position has changed: from doing every action by hand to defining goals, constraining actions, verifying results, and owning responsibility.

So the more common AI tools become, the more engineering organizations need to make rules, responsibility, and knowledge explicit. If a team has no tests, no docs, and no clear boundaries, agents will only amplify the mess. If a team has good modularity, runnable validation paths, and clear contribution rules, agents can become productivity multipliers.

The same is true for individual developers. The scarce skill may no longer be “can you use an AI tool?” It will be whether you can turn fuzzy goals into executable tasks, turn a successful interaction into reusable rules, and see real risks in AI-generated plans. AI lowers the barrier to writing code. It does not lower the value of judgment. It lets more people enter software production, and it makes experienced developers’ knowledge look more like a system design capability.

Figure 14: Developers do not disappear in the agentic era. They move from doing actions by hand to defining goals, constraining actions, and verifying results.

Software Will Keep Being Rewritten, But Open Source Remains Irreplaceable

In 2011, Marc Andreessen wrote “Software is eating the world.” Later, some people said open source was eating software, because open-source infrastructure became the default supply chain of modern software. By 2026, a sharper question appeared: will models swallow software and open source?

The answer looks more like a new division of labor.

Models will take over some actions that used to be carried by software interfaces: search, fill, organize, generate, jump, and call. But they cannot eat the order behind software. The closer models get to real work, the more software is needed to define boundaries, save state, connect systems, control permissions, record process, and handle failure.

Software will not disappear. It may become more abundant, but it will not look exactly like before. The real change is that many software-use behaviors once done by humans will become model-driven action chains. Software companies may first become “agent-usable companies.” In the past, SaaS was built around UI, accounts, business data, and workflows. In the agent era, SaaS also needs stable APIs and machine-readable docs. Interfaces still matter, but UI is no longer the only entry point. Whoever can let agents safely work for users may become the new platform.

Coding is only the first stop. The real world is messier and softer than code. Payments, finance, healthcare, government services, education, life services, and embodied intelligence all involve identity, responsibility, risk, regulation, trust, and human situations. For agents to enter these scenes, model capability is only the ticket. Institutions, products, and system design are the long race.

In 2025, the LLM developer ecosystem looked like a hackathon. Projects appeared quickly, became popular quickly, and disappeared quickly. By 2026, the contest field is slowly turning into a construction site.

Software will continue to be rewritten. Agents will become users of software. Tokens will become the energy of software. Developers will move from people who write every line by hand to people who define goals, design constraints, verify results, and take responsibility. Silicon executes. Carbon defines what is worth executing. This may be the most important division of labor in the Agentic AI era.

In this process, open source will not win automatically. But it still has an irreplaceable role. The meaning of an open ecosystem is not only to provide code. It is also to let more people understand, use, change, and share intelligent infrastructure.

Model companies can release stronger models. Cloud vendors can build larger AI factories. Application companies can make smoother closed-source products. But what keeps an ecosystem healthy over the long term is still whether developers can participate, whether projects can be audited, whether standards can be built together, whether tools can be deployed locally, and whether knowledge can be shared.

Inclusive AGI: Intelligence Should Not Be a Privilege for the Few

Ant’s path over the past twenty years has been answering similar questions. In online transactions, why did people not trust one another? Why was it hard for small merchants to get services? Could complex and expensive capabilities, once available only to a few, become easier, cheaper, and more accessible? Twenty years ago, we believed financial services should not be a privilege for the few. Today, facing AGI, we believe intelligence should not be a privilege for the few either.

This may sound like a gentle slogan, but it is a hard judgment. The real world is much more complex than abstract problems. Math problems have standard answers. Code has tests. Games have rules. But services involve cost, trust, responsibility, emotion, compliance, regional differences, and people’s actual situations. A hospital appointment, an insurance claim, a small cross-border payment, or business advice for a small merchant cannot be solved simply by a higher benchmark.

This is why Inclusion AI’s open AGI practice covers Model, Model Infra, Agent Infra, and AI Service at the same time. Models define the capability boundary. Post-training, inference, and training systems turn capability into infrastructure that can be supplied at scale. Agent Infra lets developers and domain experts connect models to real workflows. AI Service brings these capabilities into concrete fields such as finance, healthcare, and life services. Without systems, models are only demos of capability. Without tools, models have trouble entering industries. Without an open ecosystem, inclusion becomes only a rental right from a few platforms.

Towards inclusive AGI is a simple but important hope: AI should not become a black box for a few people. It should not be only a productivity machine for large companies. It should be a public technology that more people can understand, use, change, and share.

This can be reduced to three words: Available, Affordable, Inclusive.

Available means models and tools should be as open as possible, so developers, researchers, domain experts, and small or mid-sized organizations can access them, understand them, and adapt them. Open weights, data tools, inference engines, and agent protocols all lower the barrier. If intelligence is to become infrastructure, more people must be able to inspect it, adapt it, and improve it together.
Affordable means AI must really enter vertical scenarios: healthcare, finance, government services, education, public good, rural areas, and small businesses, not only premium subscriptions. The hard part is not making AI solve beautiful benchmark problems. The hard part is helping an older person book a hospital appointment, helping a small merchant run a business, or helping an ordinary family handle daily services at low enough cost.
Inclusive means the value of AI should not be captured only by large token consumers or a few platforms. Developers, open-source maintainers, data contributors, domain experts, and ordinary users should all have a place in the ecosystem. We need to respect real workflows and human experience, and let that experience compound through open collaboration into reusable tools and systems, instead of one-way releases.

Towards inclusive AGI does not mean everyone must become a model company. It does not mean everyone must write low-level frameworks. It is a simple but important hope: AI should not become a black box for the few. It should not be only a productivity machine for large companies. It should be a public technology that more people can understand, use, change, and share.

This may be the most important job for open source in the AGI era.

Notes on Data Scope

The data mainly comes from the Agentic AI landscape repository, OpenDigger, GitHub API, Hugging Face Hub, OpenRouter public leaderboards, and searches across releases, issues, PRs, and GitHub Search API results for projects such as vLLM, SGLang, TensorRT-LLM, llama.cpp, Dynamo, and LiteLLM.

Agentic AI project trend data uses the April 2026 OpenRank scope. The developer profile covers activity from January 1, 2026 to before May 1, 2026, and keeps bot and app actors.

The model routing section refers to RouteLLM (ICLR 2025) and IPR (EMNLP Industry 2025). For RouteLLM, we use the conservative wording that routing can achieve more than 2x cost savings while staying close to strong-model quality. For IPR, we use the paper’s “100% strongest-model quality” operating point and the reported 43.9% cost reduction.

Taking the Pulse of Agentic AI from the Developer Community at the End of Q1 2026

April 1, 2026 · 14 min read

Xia Xiaoya

Senior Researcher

Today, I want to share some observations on the Agentic AI ecosystem from the vantage point of 2026's first quarter—technical trends read from popular projects, portraits of AI developers, and the subtle relationship between developers and AI tools. This is not meant to be comprehensive; we welcome the community to share more observations and reflections.

Agentic Ecosystem in 2026

This year, everyone seems to be in a state where FOMO and excitement intertwine. There's a sense that AI application deployment has reached an unprecedented acceleration point—perhaps even a tipping point. But is this tipping point real or emotionally amplified? Let's calibrate our intuition with two metrics.

This chart shows the top 20 projects by OpenRank last month and the top 20 by Star growth this year—the most active and most-watched projects. I've highlighted LLM-related projects, and unsurprisingly, OpenClaw occupies the #1 and #2 spots on both lists.

Developer attention has completely flowed toward the Agent ecosystem, although the Star count list includes many awesome-collection type projects (which naturally attract more attention). Just looking at the project names, you can feel they're permutations of a few words: OpenClaw, Skills, Claude, Claude Skills, OpenClaw Skills. The actual developer effort reflected in activity metrics is somewhat more honest, but even so, LLM-related projects account for about 40%.

Expanding the scope to the top 1000 most-watched repositories, after rough labeling, we can see 81% are Agent-related. The most frequently tagged keywords in project Topics are: Agent, Claude, LLM, Code, Skill.

Looking back over the past few years, you can feel the rotation of technological ecosystem dominance from the naming of popular projects emerging at different stages. Popular projects created around 2023-2024 were mostly related to GPT and Llama, such as AutoGPT, MetaGPT, Ollama, llama.cpp. As time turns, there are always technologies that serve as unavoidable coordinates. In 2025, that coordinate was called Claude Code, and thus projects like Clawdbot (later OpenClaw) and Claude-Mem emerged.

Based on the currently most popular and active projects, we've compiled the latest map of the Agentic AI ecosystem, covering about 50+ projects. Many should look familiar, while some are new faces. Let's follow a few specific projects to examine current technical trends.

Technical Trends from Popular Projects

From Context Management to Complexity Harness

The optimizations we made under the capability constraints of the foundation models were essentially about managing information in the model's attention window: feeding more effective prompts to the model, invoking tools like browsers, connecting external background knowledge the model needs (RAG), and maintaining memory across multi-turn conversations. This path accumulated into a practice called "Context Engineering."

Claude-Mem and Context7 are two open-source tools created around mid-last-year, each now with tens of thousands of Stars. They each found interesting entry points, but essentially solve the same thing: telling the model more effective background knowledge—and making sure it doesn't forget.

Claude-Mem is a Claude Code plugin that compresses all conversation outputs during Claude Code's task execution using a model, providing them as context for future conversations to ensure the Coding Agent has longer conversation memory.

Context7 provides both MCP service and Skill loading modes. Every time a task is executed, it fetches the latest documentation of involved dependency libraries to ensure the Coding Agent doesn't execute outdated code.

But "Context Engineering" as a term is starting to feel insufficient this year, because the problem is no longer just "is there enough information," but "will the Agent lose control?" Developers have likely experienced this: during autonomous task execution, the Agent either crashes the entire system or stops halfway without saying anything.

Oh-My-OpenAgent (formerly oh-my-opencode, a plugin for OpenCode) calls itself the "strongest Agent Harness" in its project description. It built a continuous execution Enforcer called "Sisyphus": as long as TODO tasks aren't complete, it forces the Agent to keep restarting or finding new paths until 100% achievement—like Sisyphus endlessly pushing the stone up the mountain.

So I understand Harness as providing background knowledge while further constraining the Agent's behavioral boundaries—not just letting the Agent know "what is," but making clear "what it can touch" and "what it can't," and knowing what to do when stuck. Context Engineering manages input quality; Harness Engineering manages execution discipline.

Software Development Shifts from Human-Centric to Agent-Centric

This trend can already be felt from the projects above: these newly emerging tools are designed not to serve developers, but with the Agent as the execution subject. Interestingly, what humans have accumulated in software development practices is now migrating to Agents. Developers need to consult the latest documentation—so do Agents; developers need to collaborate in teams—Agents are starting to need that too.

Vibe-Kanban brings traditional task boards to the Agent team collaboration scenario, turning it into the Agent's command center. Each task creates an entry with clear acceptance criteria (AC) on the board. Agents execute against AC, while human engineers do task preview and Diff Review through an integrated UI. This is essentially a Harness too—just constraining not individual Agent execution behavior, but the entire development process.

A fitting analogy: model-driven code generation is a powerful but directionless horse; Harness is the equipment composed of constraints, guardrails, and feedback mechanisms; humans are riders, responsible for giving direction, not running themselves.

The Agent "Evolution" Proposition—Lobsters, Cats, and Bees

Agents are clearly no longer satisfied with fixed process orchestration—self-evolution is the new proposition. OpenClaw started the "raising lobsters" trend first, and soon a new batch of cats and lobsters appeared. These projects, inspired by OpenClaw, each made tradeoffs in different dimensions.

nanoclaw was launched in late January 2026 by indie developer Cohen, built entirely on Anthropic Claude Agent SDK with a core engine of about 4000 lines of code. Its design philosophy is security-first—all Agents run in isolated containers, using Apple Container on macOS and Docker on Linux, with Bash commands running in containers rather than on the host machine. Andrej Karpathy specifically mentioned it on social media: "The codebase is small enough that both I and AI can understand it, so it feels manageable, auditable, and flexible." This sentence precisely captures what this batch of lightweight frameworks is betting on: understandability itself is a security guarantee.

nanobot goes even more extreme. From HKU's Data Intelligence Lab (HKUDS), about 4000 lines of Python code—99% less than OpenClaw. It strips away all non-core modules, keeping only the ReAct reasoning loop, tool calling, and message queue. It even removed the litellm external dependency in subsequent versions, switching to native SDK for direct model connection—the shorter the supply chain, the smaller the risk.

CoPaw takes the opposite approach. Open-sourced by Alibaba Cloud's AgentScope team, it takes the feature-complete route. Built-in active heartbeat mechanism—not just passively responding to user messages, but proactively triggering tasks at set times. Memory is stored locally, with user preferences and historical tasks continuously accumulating. Supports DingTalk, Feishu, Discord, iMessage, and other channels, with a continuously expanding Skills ecosystem. If nanoclaw and nanobot are doing subtraction, CoPaw is seriously answering "what a complete personal AI assistant should look like."

Early this year, another open-source framework named Aden Hive appeared, answering a deeper question: Can the orchestration framework itself self-evolve?

The fundamental difference from traditional frameworks like LangChain and AutoGPT isn't in functionality, but in that it doesn't require developers to predefine agent execution flows. Its approach: describe goals in natural language, have a Coding Agent (Queen Bee) generate the Agent execution graph and connection code; once running, if failures occur, the framework captures failure data and calls the Coding Agent again to analyze causes, modify structure, and redeploy. This closed loop requires no human intervention. This is a serious bet on generative orchestration. It bets that task complexity often can't be predefined—rather than exhaustively enumerating all cases at design time, let the system continuously grow from feedback during real execution.

Whether Agents as personal assistants or Agent orchestration frameworks themselves, self-evolution is transitioning from a bonus feature to a design starting point.

Model "Big Three" Each Build Complete Ecosystem Tools

The top model companies are each laying out their open-source ecosystem tools and standards. MCP, Skills, Agents.md—one after another they land, and third-party tools can barely keep up digesting them.

An interesting phenomenon is the blurring boundary between Coding Agent and General Agent. After ChatGPT appeared, people searched for a long time before finding viable landing scenarios beyond Chatbot—Coding was among the first to be validated. But when tools like Claude Code reach a certain level, they naturally expand outward, not wanting to just be code-writing tools. OpenClaw was born under this expectation—using the IM window people are most familiar with as a carrier, attempting to carry more general Agent capabilities.

Project Story: One-Person Company? Zero-Person Company!

Just as the OPC (One Person Company) concept was being hotly discussed, a project called Paperclip that appeared in early March has pushed this further. The concept it's hyping: Zero-Person Company. In just over 20 days, Stars grew from 0 to 40,000.

Paperclip's positioning is very direct:

"If OpenClaw is an employee, Paperclip is the company."

Its usage logic has three steps: set goals, recruit a team, press start.

The goal could be "grow this AI note-taking app to $1M monthly revenue"; the team could be Claude as CEO, Cursor as CTO, Codex for engineering, OpenClaw for marketing; once started, this company begins running itself.

Even more interesting is its governance design. Agents can't hire new Agents themselves—this needs your approval; CEO can't unilaterally execute strategies—needs your confirmation. Paperclip positions you as the board—you can pause, override, reassign, or terminate any Agent at any time. Autonomy is a privilege you grant, not an Agent's default power.

In the OPC era, one person can do many things. But the question Paperclip is asking: if even that "one person's" execution work can be outsourced to Agents, what role remains for you? Probably just one word: Board.

The AI Era's "Developers and AI"

Having covered projects, let's look at the other side: the people behind these projects.

Developers: Concentrated in Head Projects, But from Diverse Backgrounds

In February 2026, across the top 50+ Agentic projects, there were approximately 21,000 independently active developers. But the “21,000” figure is somewhat misleading, because they are not evenly distributed across these projects: active developers in OpenClaw and Claude Code alone account for nearly half of the total.

Activity distribution is similarly highly concentrated. This is the familiar power law phenomenon in open-source communities, but it's particularly extreme in this ecosystem: top developer activity scores reach 81, while 95% of developers have activity under 1—a minority driving most substantive progress.

There are several noteworthy numbers in these developers' background composition. Among the 4,232 developers who filled in company information, those from big companies like FAANG and BAT account for less than 10%. More are independent developers and startup people—this ecosystem is not currently dominated by big company engineers.

Geographically, among the 6,295 developers who filled in country information, US developers account for 30%, and Chinese developers account for 10%.

Developers: Young and Cross-Disciplinary, "Builders," "Founders," and "Digital Nomads"

We focused on the top 100 most active developers. They're significantly younger, or at least arrived at the developer community later—the median account creation time is January 2018. If you include long-tail developers, the median becomes December 2013. These two numbers together tell us one thing: a significant portion of top active contributors entered the developer community after the Kubernetes era, and their technical intuition and infrastructure cognition differ noticeably from cloud-native veterans.

Even more extreme: among the 100, one-quarter (25 developers) registered GitHub after 2023, meaning they started coding only after LLMs truly went mainstream. ComfyUI author comfyanonymous and Aden Hive author RichardTang-Aden are among them. They're not developers "changed" by the AI wave—they're developers "summoned" by it. Before this, they might not have considered themselves developers at all.

Here are several representative developers. In their self-descriptions, they are designers, musicians, self-taught developers, prompt engineers, hackers, and digital nomads. Their commonality isn't technical background—it's that verb: build.

Developers and AI: Replacement or Symbiosis? Let's Look at the Numbers

This question is hard to answer directly, but numbers can provide clues. Searching GitHub for Claude-attributed Commits yields over 20 million results. Using the same search method: Cursor about 1 million, Copilot 700K, Gemini 450K, Codex even lower. The difference between Claude and others is a full order of magnitude.

Of course, this data has obvious limitations—this is fuzzy search, and many AI-participated Commits won't be attributed at all, and attribution habits vary by tool and team culture. But even discounting, this order-of-magnitude difference still tells us one thing: Claude-series tools are embedded quite deeply in actual code submission pipelines.

Beyond code generation, Review is another link being taken over by Agents. Copilot and CodeRabbit have completed hundreds of thousands of code Reviews in less than three months this year. The significance of this number isn't just scale, but that Review was previously considered highly dependent on human judgment—it requires understanding context, intent, and team norms. How well Agents can do this is still hard to determine, but they're already doing it.

Among all Agent landing scenarios, Coding is one of the few that has truly completed commercial validation. Other scenarios are still telling stories; Coding Agents are already collecting money.

2026 Coding Agent Landscape: Prompting, Generation, Review, to Requirements Management

We've compiled a landscape of currently popular Coding Agents. The code completion stage is basically past tense, but Copilot is still holding on. While it can't match Claude at writing code, as GitHub's native AI collaboration tool, it's still leading in code review.

Due to time constraints, we didn't do deeper research this time. There's an interesting question: do PRs using Review Agents get merged significantly faster than those without? Intuitively yes, but "significantly" to what extent, and in what types of projects is it most obvious—this deserves serious data analysis.

The more interesting part of the landscape is that some projects are already exploring earlier stages of the software development lifecycle—requirements management. Besides the aforementioned Vibe Kanban, Dane in the Mastra project is another fascinating bot. It can connect to various community channels—Slack, Discord, or mailing lists—extract or abstract project requirements from discussions, and directly file Issues in repositories.

👆This sentence is a personal feeling written at the end.

Peter Steinberger is a tireless open-source builder and creator in the AI era. Before OpenClaw, he had already open-sourced 50+ projects. OpenClaw rekindled everyone's enthusiasm in this exhausted era, largely because it's an open-source project—not just spiritually, but because open-source means it can run locally, means data has some degree of privacy, means you can optimize or fork the project.

Under the AI FOMO wave, models iterate, products iterate, funding iterates. But openness, sharing, and collaboration have never truly gone out of style in the developer community. This is perhaps one of the few things in this ecosystem that doesn't need to wait for "the next version."

The Community Stories of vLLM and SGLang

December 17, 2025 · 7 min read

inclusionAI

Ant Group

Originally published on Medium by Ant Open Source.

First, what is LLM inference?

Training large language models (LLMs) attracts attention for its massive compute demands and headline-making breakthroughs; however, what ultimately determines their real-world practicality and broad adoption is the efficiency, cost, and latency of the inference stage. Inference is the process by which a trained AI model applies what it has learned to new, unseen data to make predictions or generate outputs. For LLMs, this means accepting a user prompt, computing through the model's vast network of parameters, and ultimately producing a coherent text response.

The core challenge in LLM inference is deploying models with tens to hundreds of billions of parameters under tight constraints on latency, throughput, and cost. It is a complex, cross-stack problem spanning algorithms, software, and hardware. Among open-source inference engines, vLLM and SGLang are two of the most closely watched projects.

From academic innovation to a community-driven open-source standard-bearer

vLLM and SGLang history

vLLM traces its roots to a 2023 paper centered on the PagedAttention algorithm, "Efficient Memory Management for Large Language Model Serving with PagedAttention." vLLM's breakthrough wasn't a brand-new AI algorithm; instead, it borrowed paging and cache management ideas from operating systems to fine-grain memory management, laying the groundwork for high-throughput request handling via its PagedAttention mechanism. vLLM also embraced and advanced several industry techniques, such as Continuous Batching first described in the paper "Orca: A Distributed Serving System for Transformer-Based Generative Models."

vLLM star growth

(Source: star-history)

vLLM delivered striking gains: compared with a Hugging Face Transformers–based backend, vLLM handled up to 5× the traffic and boosted throughput by as much as 30×. Within less than half a year it amassed tens of thousands of stars; today, over ten thousand contributors have engaged in issue/PR discussions, nearly 2,000 have submitted PRs, and on average at least 10 new issues are filed daily.

SGLang originated from the paper "SGLang: Efficient Execution of Structured Language Model Programs" and opened new ground with a highly optimized backend runtime centered on RadixAttention and an efficient CPU scheduling design. Rather than discarding PagedAttention, RadixAttention extends it: it preserves as much prompt and generation KV cache as possible and attempts to reuse KV cache across requests; when prefixes match, it slashes prefill computation.

Community metrics for vLLM and SGLang

(Current community metrics for both projects, data as of August 22, 2025)

Community-wise, SGLang is a fast-rising newcomer with a leaner footprint — its total contributor count is less than half of vLLM's. Most issues in vLLM receive responses within 12 hours to 3 days, whereas in SGLang it typically takes 3 to 5 days.

Origins: a continuous current of innovation

As a leading U.S. public research university, UC Berkeley has produced a remarkable roster of open-source projects: Postgres in databases, RISC-V in hardware, Spark in big-data processing, and Ray in machine learning. Early core initiators of the two projects — Woosuk Kwon (vLLM) and Lianmin Zheng (SGLang) — both hail from Berkeley and studied under Ion Stoica, the luminary who led students to create Spark and Ray.

vLLM led the way with an open-source release in June 2023; SGLang debuted roughly six months later. In 2023, Lianmin, Stanford's Ying Sheng, and several scholars founded the open research group LMSYS.org, launching popular projects such as FastChat, Chatbot Arena, and Vicuna.

Today, core initiators Woosuk and Lianmin remain actively involved. Recent six-month contributor data show that early-career academic researchers remain a major force. Beyond academia, vLLM's contribution backbone includes Red Hat, while SGLang's core contributors come from xAI, Skywork, Oracle, and LinkedIn.

Top contributors by organization

As many as 194 developers have contributed code to both vLLM and SGLang — about 30% of SGLang's total code contributors to date. Notable cross-contributors include:

comaniac (OpenAI): 17 early PRs to SGLang + 77 PRs total to vLLM
ShangmingCai (Alibaba Cloud Feitian Lab): 18 PRs to vLLM, then shifted focus to SGLang with 52 PRs
CatherineSue (Oracle): 4 bug-fix PRs to vLLM, then 76 PRs to SGLang as a core contributor

Development, refactors, and fierce competition

Key milestones from an OpenRank perspective:

June 2023: vLLM officially launches, introduces PagedAttention, and grows quickly.
January 2024: SGLang ships its first release and gains industry attention thanks to RadixAttention.
July 2024: SGLang releases v0.2, entering its first acceleration phase.
September 2024: vLLM ships v0.6.0, cutting latency ~5× and improving performance ~2.7× via CPU-scheduling and other optimizations; SGLang releases v0.3 the day before.
December 2024–January 2025: vLLM unveils the V1 refactor. With DeepSeek V3/R1 bursting onto the scene, both begin a second wave of explosive growth.

OpenRank comparison

In 2024, as features and hardware support expanded rapidly, vLLM hit classic software-engineering headwinds. A third-party performance study published in September showed that in some scenarios vLLM's CPU scheduling overhead could exceed half of total inference time. The official blog acknowledged the need for a foundational refactor: V1 arrived in early 2025, after which growth re-accelerated.

CPU scheduling overhead comparison

CPU scheduling overhead: vLLM (left) vs. SGLang (right)

In 2025, the performance race among inference engines heated up. Recognizing the limits of "number wars," both gradually shifted to reproducible methods and end-to-end metrics. A recent third-party comparison from Alibaba Cloud benchmarking vLLM vs. SGLang on the Qwen family showed overall single-GPU/dual-GPU results favoring SGLang, though outcomes vary across hardware/models/configurations.

Trend-wise, model architectures are showing signs of convergence. Leaders vLLM and SGLang now both support Continuous Batching, PagedAttention, RadixAttention, Chunked Prefill, Speculative Decoding, Disaggregated Serving, and CUDA Graphs; operator libraries like FlashInfer, FlashAttention, and DeepGEMM.

Other inference engines to watch:

TensorRT-LLM: launched by NVIDIA in late 2023, deeply tuned for its own hardware
OpenVINO: developed by Intel, focused on efficient deployment across Intel CPUs/GPUs
Llama.cpp: written in C++ by Georgi Gerganov in 2023, targets low-barrier edge inference
LMDeploy: co-developed by the MMDeploy and MMRazor teams, with dual backends — TurboMind for high performance and PyTorch for broad hardware coverage

Moving Forward in the Ecosystem

During their rapid-growth phase, both vLLM and SGLang drew attention from investors and open-source foundations:

In August 2023, a16z launched the Open Source AI Grant, funding vLLM core developers Woosuk Kwon and Zhuohan Li. In a later cohort, SGLang core developers Ying Sheng and Lianmin Zheng were also funded.
In July 2024, ZhenFund donated to vLLM, and LF AI & Data announced vLLM's entry into incubation; this year vLLM was moved under the PyTorch Foundation.
In March 2025, PyTorch published a blog post welcoming SGLang to "the PyTorch ecosystem."

Both projects have become go-to inference solutions globally, with active participation from engineers at Google, Meta, Microsoft, ByteDance, Alibaba, Tencent, and other companies.

Developer company backgrounds (issues)

(Data Source: ossinsight) Company Backgrounds of Developers Submitting Issues

Today, roughly 33% of vLLM's contributors are based in China, and about 52% for SGLang. Both communities host regular in-person meetups in Beijing, Shanghai, Shenzhen, and other cities worldwide.

Open Source LLM Development Landscape 2.0: 2025 Revisited

October 11, 2025 · 9 min read

inclusionAI

Ant Group

Originally published on Medium by Ant Open Source.

Just over three months ago, Ant Open Source and inclusionAI jointly released the very first Open Source LLM Development Landscape, along with a trend insights report. Our goal was simple: to highlight which projects in this fast-moving ecosystem are most worth tracking, using, and contributing to.

That's why we're excited to unveil the 2.0 release of our Landscape — a refreshed view of the ecosystem, built with even more insights and context. With the 2.0 release, we also refreshed our methodology for mapping the ecosystem, surfacing a wave of previously overlooked projects while removing others that didn't make the cut.

Open Source LLM Development Landscape 2.0

Open Source LLM Development Landscape 2.0: https://antoss-landscape.my.canva.site/

The updated landscape is organized into two major directions: AI Infra and AI Agents. Drawing on community data, we identified and included 114 of the most prominent open source projects, spanning 22 distinct technical domains.

For 2.0, we shifted to using the global GitHub OpenRank rankings directly. From the top down, we filtered projects by their descriptions and tags to identify those belonging to the LLM ecosystem, and gradually refined the scope. Only projects with an OpenRank score of 50 or higher are included.

Note: By installing the HyperCRX browser extension, you can view an open-source project's OpenRank trend in the bottom-right corner of its GitHub repository page.

Compared with the 1.0 release, this new 2.0 Landscape brings in 39 fresh projects — about 35% of the total list. On the other hand, 60 projects from the first version have been dropped, mostly because they fell below the new bar. Even if we include the dropped projects, the median "age" of all projects is just 30 months — barely two and a half years old. 62% of these projects were launched after the "GPT moment" (October 2022).

These projects have drawn participation from 366,521 developers worldwide. Among those with identifiable locations, about 24% are based in the United States, 18% in China, followed by India (8%), Germany (6%), and the United Kingdom (5%).

Global Developer Contribution

Across the 170+ open source projects covered in both Landscape versions, we observed over 360K GitHub accounts engaging through issues or pull requests. Among these, we identified 124,351 developers with parseable location data.

Overall, U.S. accounts for 37.4% of contributions, with China at 18.7%, putting their combined share above 55%. Germany drops sharply to 6.5% in third place.

Top 10 Countries by Contribution

Top 10 Countries by Contribution in Open-Source LLM Ecosystem

Looking across technical fields:

In AI Infra, U.S. and China account for over 60% of contributions
In AI Data, participation is more globally distributed, with several European countries ranking in the global top 10
In AI Agents, U.S. and Chinese developers contribute 24.6% and 21.5% respectively

Large Models Landscape 2025

Outside of the open source development ecosystem, large models themselves are being released at a rapid pace. A few interesting observations:

MoE Takes Center Stage: Flagship models like DeepSeek, Qwen, and Kimi have all adopted Mixture of Experts (MoE) architecture — sparse activation enabling trillion-parameter giants like K2, Claude Opus, and o3.
Reinforcement Learning Boosts Reasoning: DeepSeek R1 combines large-scale pretraining with RL-based post-training, making reasoning the signature feature for flagship model releases in 2025. Series like Qwen, Claude, and Gemini have begun integrating "hybrid reasoning" modes.
Multimodality Goes Mainstream: Most 2025 releases focus on language, image, and speech interaction, though specialized vision-only and speech-only models have also emerged.

From Landscape to Tech Trends

Large Models Development Keywords

We extracted keywords from the GitHub descriptions and topics of every open-source project in the Landscape. The most frequent keywords are: AI (126), LLM (98), Agent (81), Data (79), Learning (44), Search (36), Model (36), OpenAI (35), Framework (32), Python (30), MCP (29).

Keyword cloud

Top 10 Open Source Projects

The top 10 projects by OpenRank span nearly the entire chain: from foundational compute and frameworks like PyTorch and Ray, to training data pipelines such as Airflow, to high-performance serving engines like vLLM, SGLang, and TensorRT-LLM. On the application side: Dify, n8n, Gemini CLI, and Cherry Studio.

Top 10 by OpenRank

Note: All data is as of August 1, 2025

Looking at the forces behind these projects:

Academia: Projects like vLLM, SGLang, and Ray emerged from UC Berkeley's labs under Ion Stoica
Tech giants: Meta, Google, NVIDIA hold or shape critical positions in the stack
Indie teams: Smaller teams like Dify and Cherry Studio are innovating rapidly near the application layer

Redefining Open Source in the LLM Era

Veterans familiar with open source licensing might feel alarm when looking at licenses adopted by today's top projects. While most projects still rely on permissive licenses like Apache 2.0 or MIT, several high-profile cases stand out:

Dify's "Open Source License": Based on Apache 2.0 but restricts unauthorized multi-tenant operation and prohibits removing logos/copyright notices.
n8n's "Sustainable Use License": Allows free use and modification but restricts commercial redistribution.
Cherry Studio's "User-Segmented Dual Licensing": AGPLv3 for ≤10-person orgs; commercial license required for larger orgs.

At the same time, GitHub has evolved into a stage for product operations. Many products with closed-source codebases — like Cursor and Claude Code — still maintain GitHub presences primarily for collecting user feedback, often accumulating huge numbers of stars despite providing little or no actual source code.

Shifting Trends Across Technical Domains

AI Coding, Model Serving, and LLMOps are all on an upward trajectory. AI Coding stands out with a steep growth curve — once again confirming that boosting R&D efficiency with AI is the application scenario truly taking root in 2025.

On the other hand, Agent Frameworks and AI Data have shown noticeable declines. The drop in Agent Frameworks is closely tied to reduced community investment from once-dominant projects like LangChain, LlamaIndex, and AutoGen.

Projects on The Brink List

Some projects didn't make it into this version but still show strong potential. Among the projects that dropped out, many appear to be heading toward the "AI graveyard":

Manus briefly exploded in popularity, inspiring open-source forks like OpenManus and OWL, but the hype proved short-lived.
NextChat, one of the earliest popular LLM client apps, lost ground to newer entrants like Cherry Studio and LobeChat.
Bolt.new, once a trendy full-stack web dev tool, was open-sourced as template repos with little external contribution.
MLC-LLM and GPT4All were once widely used for on-device deployment, but Ollama emerged as the clear winner in this niche.
FastChat evolved into the more successful SGLang and LMArena platforms.
Text Generation Inference (TGI) was gradually abandoned by Hugging Face as performance fell behind vLLM and SGLang.

100 Days of Change and Continuity

Beyond project reshuffling, the jump from 1.0 to 2.0 brought refinements to how we define and describe the ecosystem. The broad categories of "Infrastructure" and "Application" were restructured into three clearer domains: AI Infra, AI Agent, and AI Data.

New Fields and Projects Entering the Spotlight

The most notable shifts are happening in the Agent layer, with high-profile projects emerging across AI Coding, chatbots, and development frameworks. Two projects stand out for their connection to embodied intelligence: AI XiaoZhi (ESP32-based AI voice interaction device) and Genesis (robotics and embodied simulation platform).

On the Infra side, the biggest change is the integration of "model operations" into a more holistic concept: LLMOps — spanning Observability (Langfuse, Phoenix), Evaluation & Benchmarking (Promptfoo), and Agentic Workflow Runtime Management (1Panel, Dagger).

Top 10 Active Newcomers: Notably, Gemini CLI ranked 3rd and Cherry Studio ranked 7th across all projects in the Landscape — a remarkable showing for first-time entrants.

Top 10 new projects

Note: All data is as of August 1, 2025

What Hasn't Changed: Rise, Fall, and the Cycle of Momentum

Among the new wave, OpenCode was positioned from day one as a 100% open-source alternative to Claude Code. Other newcomers highlight how major players are laying out strategies across model serving, agent toolchains, and AI coding:

Dynamo supports vLLM, SGLang, and TensorRT-LLM while being optimized for NVIDIA GPUs
adk-python and openai-agents-python are agent builders packaged for Gemini and OpenAI models
Gemini CLI and Codex CLI bring autonomous code understanding directly into the command line

The projects showing the most noticeable growth include TensorRT-LLM, verl (RL framework from ByteDance), OpenCode, and Mastra (TypeScript/JavaScript Agent framework). In contrast, the sharpest declines include Eliza, LangChain, LlamaIndex, and AutoGen.

Core Tech Trends: Serving, Coding, and Agents

Serving: Making Models Truly Usable

Model serving is about running a trained model in a way that applications can reliably call — not just "can it run?" but "can it run efficiently, controllably, and at scale?" Since 2023, rapid progress has made serving the critical middleware layer connecting AI infrastructure with applications.

Coding: The New Developer Vibe

AI Coding has evolved far beyond basic code completion, now encompassing multimodal support, contextual awareness, and collaborative workflows. CLI tools like Gemini CLI and OpenCode leverage large models to transform developer intent into faster coding. Plugin-based tools such as Cline and Continue integrate into existing development platforms.

Agent: Building Toward AGI

2025 is widely considered the year AI applications truly land. The open-source ecosystem has expanded with projects specializing in different components: Mem0 (memory), Browser-Use (tool use), Dify (workflow execution), and LobeChat (interaction interface) — together shaping a more complete foundation for building autonomous AI systems.

More on GitHub: https://github.com/antgroup/llm-oss-landscape

Ming-UniVision: Joint Image Understanding and Generation via a Unified Continuous Tokenizer

October 1, 2025 · 7 min read

inclusionAI

Ant Group

GITHUB 🤗 Hugging Face｜ 🤖 ModelScope

🚀 Technical Highlights

First Continuous Unified Tokenizer for Vision: MingTok seamlessly supports both image understanding and generation within a single continuous latent space—eliminating quantization and bridging modalities.
First NTP-style Autoregressive MLLM with Unified Continuous Visual Tokens: By building on MingTok, Ming-UniVision unifies vision and language under a shared next-token prediction framework, enabling end-to-end autoregressive modeling of diverse vision tasks.
Reduced Representational Competition → 3.5× Faster Convergence: The unified continuous representation aligns semantic understanding and generative dynamics, significantly accelerating joint training without performance trade-offs.
Multi-Round In-Context Learning in a Single Feature Space: All operations—understanding, generation, and editing—occur in the same continuous space, eliminating costly cross-space conversions and enabling simpler, more efficient training and inference.

The Challenge: The Inverse Nature of Seeing and Drawing

Autoregression—the powerful paradigm of modeling the world by “predicting the next token”—has already unified diverse modalities like language and audio. The next frontier is to bring visual understanding (seeing) and visual generation (drawing) into this unified sequence‑to‑sequence framework.

However, this ambition encounters a deep challenge: in many respects, understanding and generation are inverse tasks.

Understanding: Pixels → high‑dimensional, abstract semantic concepts
Generation: Concepts → fine‑grained, high‑fidelity pixels

These tasks have drastically different—and often competing—preferences for their underlying visual representation.

Why Previous Approaches Fell Short

Existing models attempt unification via two limited strategies:

Asymmetric Designs: Use different, heterogeneous feature spaces for each task. During multi‑turn interactions, this forces inefficient “round‑trips” between spaces, causing latency and complexity.
Shared Discrete Tokens: Unify the token space but introduce quantization errors. This hurts image fidelity and degrades understanding capability.

Our Solution: Ming-UniVision and MingTok

To break this impasse, we introduce Ming-UniVision, a new generation of autoregressive vision‑language model built on a foundational innovation: MingTok.

MingTok is the first visual tokenizer based on a continuous latent space. It delivers a truly unified and efficient representation that serves as the bedrock for Ming‑UniVision’s unified NTP (Next‑Token Prediction) framework—harmonizing image understanding, generation, and editing in one in‑context multimodal loop.

The Core Design: A Three-Stage Architecture to Reconcile Competition

At the heart of Ming-UniVision is the MingTok tokenizer, a three-stage sequential architecture elegantly designed to reconcile the competing representational demands of understanding and generation within a single framework.

Figure 1: Architecture Comparison Figure 1: (a) Existing models use separate visual representations. (b) MingTok, the engine of Ming-UniVision, uses a unified scheme for both semantic and generative representations. (c) This unified approach leads to over 3.5x faster training convergence.

Low-level Encoder: Maps an input image into a sequence of compact, continuous latent codes, optimized for high-quality and efficient autoregressive generation.
Semantic Decoder: Autoregressively "refines" the compact latent codes into high-dimensional, rich semantic features aligned with top-tier understanding models like CLIP.
Pixel Decoder: Serves as a quality-assurance module, ensuring the original image can be reconstructed with high fidelity, guaranteeing a high-fidelity representation process.

The Key Innovation: MingTok creates a unified, differentiable interface. The high-level features for understanding can be directly fed as conditional input for the next round of generation or editing. This completely eliminates the costly detour through pixel space.

The Breakthrough: A Fundamental Leap in Efficiency

By integrating MingTok, Ming-UniVision achieves competitive results on both understanding and generation tasks. The shared continuous latent space unlocks two fundamental layers of efficiency, resolving bottlenecks that have plagued previous architectures.

Figure 2: Benchmark Results Figure 2: On general recognition tasks, our method approaches the performance of models with separated representations and significantly outperforms other unified representation models. For generation, our model shows a clear advantage on fine-grained tasks.

1. A Revolution in Training: >3.5x Faster Convergence

Traditional approaches expend massive resources aligning heterogeneous representations, creating an intrinsic "task competition" that slows learning. MingTok solves this at its root.

Synergistic Enhancement: Our ablation studies show that using MingTok for both tasks fosters a synergy where understanding and generation capabilities enhance each other, rather than competing.
>3.5x Speedup: By avoiding inefficient alignment, the model focuses its energy on learning, reaching the same performance level in a fraction of the time compared to traditional schemes.

Figure 3: Pre-training Performance Figure 3: The performance drop between generation-only training and joint training is minimal with MingTok, proving the advantage of our unified approach.

2. A Revolution in Interaction: Goodbye to the "Pixel Round-Trip"

The efficiency of multi-turn interactions (e.g., generate → edit → re-generate) depends on the "understanding-generation" loop. This is precisely where traditional architectures falter.

Architecture Type	Multi-turn Capability	Core Bottleneck	Interaction Path	Efficiency & Fidelity
DiT-based Models	❌ Not Natively Supported	Non-autoregressive, stateless	N/A (Full process restart)	Low
Hybrid Architectures	⚠️ Supported, but Inefficient	Dual-branch, un-unified spaces	`Latent → Pixel → Feature`	Low, complex, lossy
Unified AR	⚠️ Supported, but Inefficient	Heterogeneous spaces	`Latent → Pixel → Feature`	Low, lossy
Ming-UniVision	✅ Native & Highly Efficient	Unified Continuous Space	`Feature → Feature`	High & High-Fidelity

As the table shows, any architecture with separated spaces is doomed to the inefficient Latent → Pixel → Feature round-trip. This "pixel detour" introduces massive latency and causes contextual information to decay.

Ming-UniVision achieves a direct Feature → Feature closed loop. High-level features from an understanding task can be directly consumed by the next generation task, unlocking truly coherent multimodal sequence modeling. This enables tasks that once required multiple specialized models to emerge naturally within a single, unified framework:

Iterative Image Enhancement: Perform super-resolution, then directly continue with colorization or denoising.
Generative Chain-of-Thought: Perform an understanding task (e.g., "segment the car"), then directly apply an editing command to that region.

Figure 4: Multi-turn Interaction Demo Figure 4: Multi-turn tasks like "Super-resolution → Colorization" and "Segmentation → Editing" are now part of a seamless flow.

Understanding, generation, and editing are no longer isolated pipelines but are woven into a continuous visual conversation.

Conclusion and The Road Ahead

We believe that a unified and continuous visual representation like MingTok opens up new possibilities for building more flexible and intuitive multimodal interactive systems.

We know this is just one step in a long journey. We have open-sourced our code and initial model weights, hoping to provide a useful foundation for the community and to inspire more discussion around unified representations. We look forward to collaborating with our peers to collectively advance the future of multimodal AI.

Get Involved

GitHub: (https://github.com/inclusionAI/Ming-UniVision)
Technical Report: (https://arxiv.org/pdf/2510.06590)
Model link: Hugging Face｜ 🤖 ModelScope

Try out our open-source model Ming-UniVision and MingTok-Vision on our GitHub Page / Demo Page. Please star our repo if you like it!

Segmentation-as-Editing for Unified Multimodal AI

September 13, 2025 · 8 min read

inclusionAI

Ant Group

GITHUB 🤗 Hugging Face｜ 🤖 ModelScope

The Hype and the Hidden Question

The multimodal AI world has been thriving.

From the debut of Qwen-Image to the interactive editing hype sparked by Nano Banana, image editing has rapidly become the next battlefield for generative AI.

Editing fundamentally requires two distinct skill sets:

Know where, what, and how to change (understanding the image)
Produce the change with high visual quality (generating the image)

Its rich gameplay and strong interactivity have pulled in users, developers, and creators alike.

But behind the noise, few are asking:

Beneath this prosperity, how close are we to a truly unified “understanding + generation” AI?

Understanding and Generation: Two Hands, Often Out of Sync

For years, we’ve chased an ambitious goal:

Build a unified multimodal model that understands the world like a scientist (e.g., image segmentation) while creating it like an artist (e.g., image editing).

In theory, these abilities should be mutually reinforcing:

“The deeper the understanding, the better the creation; the more the creation, the deeper the understanding.”

Reality is messier.

In AI today:

Understanding = the left hand: precise abstractions, semantic reasoning, boundaries.
Generation = the right hand: coherent pixels, style, aesthetics.

But training a model to recognize 10,000 cat photos doesn’t magically make it capable of painting cats, and painting cats repeatedly doesn’t make it understand cats better.

Worse, in multitask training, the two often compete for resources — optimizations for understanding can hurt generation, and vice versa.

We’re missing a catalyst: a task that forces the left and right hands to evolve together.

The Struggle: 16% Segmentation and Out-of-Control Generation

Before finding our solution, our unified model was struggling with generative segmentation:

Given an instruction like “segment the banana in the upper-right corner”, we wanted the model to output a segmentation mask directly.

The results were painful.

Struggling with Segmentation

On RefCOCO-val, our cIoU plateaued at ~16%.

The root cause is the distribution gap.

Generative models thrive on natural, continuous image distributions. Segmentation masks, however, are synthetic, abstract, binary maps — as unnatural as it gets for an image generator.

It was like asking a painter to draw an X-ray: doable, but far from their artistic instincts.

Here, generation wasn’t helping segmentation — it was tripping it up.

We needed a new task that:

Met the precision demands of understanding.
Played to the strengths of generation.

The “Aha” Moment: Dressing Segmentation in Color

Here’s the analogy that unlocked it for us:

If you want a child to mark an object, is it easier to have them draw a tight outline with a pencil, or fill it in with bright colors?

Obviously, the latter.

Instead of forcing our model to output abstract black-and-white masks, we turned the segmentation task into a color-editing task.

Example:

Instruction: “segment the banana in the upper-right”
Old way: Output a mask ❌
New way: Directly edit the image: “paint the banana purple”, “make the banana red”, etc. ✅

Segmentation as Editing

This brought the task’s data distribution back to the realm of natural images — where generative models shine.

Why This Works: The Hidden Catalyst

That small twist turned out to be exactly the catalyst we’d been searching for.

Boosting Understanding: To color the banana without bleeding outside the boundary, the model must internally nail pixel-perfect segmentation. The segmentation step became an implicit prerequisite to editing.
Unleashing Generation: No more awkward synthetic masks — the model is doing what it knows best: image-to-image editing. All its strengths in shading, texture, and edge blending go into making the change look natural.

For the first time, the left hand and right hand weren’t fighting — they were helping each other.

The Numbers: From 16% to 72.4% — and Beyond

1. SOTA-level Segmentation

The cIoU score didn’t just improve — it soared from 16% to 72.4% on RefCOCO-val, a relative gain of over 350%.

Qualitatively, the model outperformed competitors in pinpointing and segmenting targets, even in reasoning-heavy cases.

Against Qwen-Image and Nano Banana, our model:

Located small or occluded targets more reliably.
Produced boundaries that were visually and semantically aligned with instructions.

Segmentation Comparison 1 Our model (right) accurately locates and segments the target subject. Qwen-Image (second from left) fails to locate the correct target, while Nano-banana (third from left) fails to accurately segment the man's head and has loose boundary lines.

Segmentation Comparison 2 For the prompt "please segment the girl with red mask," our model (right) is precise. Qwen-Image (second from left) misses the feet, and Nano-banana (third from left) alters the subject's proportions.

During evaluation, thanks to the high consistency of non-edited regions in our model, we can directly derive the segmentation mask by calculating the difference between the edited result and the original image.

Calculating difference on Ming-Lite-Omni1.5, Qwen-Image-Edit, Nano-banana

The results show that our model's performance on segmentation is now on par with specialized vision models.

Model Category	Model Name	RefCOCO (val)	RefCOCO+ (val)	RefCOCOg (val)
Vision Specialist Models	VLT	67.5	56.3	55.0
	CRIS	70.5	62.3	59.9
	LAVT	72.7	62.1	61.2
	PolyFormer-B	74.8	67.6	67.8
MLLM + Specialist (SAM)	LISA-7B	74.1	62.4	66.4
	PixelLM-7B	73.0	66.3	69.3
Generative Models	Nano-banana*	15.7	13.9	14.9
	Qwen-Image-Edit*	30.3	28.8	34.0
	Ming-Lite-Omni1.5	72.4	62.8	64.3

For each test set, Nano-banana and Qwen-Image-Edit was evaluated on a randomly sampled subset of 500 images, to reduce computational cost while preserving the key statistical trends. We observed that Nano-banana frequently fails to accurately grasp the image segmentation intent during inference, leading to its comparatively lower evaluation metrics. This may be attributed to differences in training objectives and data emphasis.

2. Sharper, More Controllable Editing

The beauty of this method is that it not only fixed the segmentation weakness but also dramatically enhanced the model's general editing capabilities.

Because the model has learned an unprecedented "respect for boundaries" through thousands of "precise coloring" exercises, this "muscle memory" for fine-grained control has transferred to all editing tasks. Our edit controllability score saw a significant jump from 7.69 to 8.12 across sub-tasks like background, color, and material changes.

Editing Controllability Comparison Prompt: "remove the bow tie of the man on the far right." Our model (right) precisely removes only the target bow tie while maintaining background consistency. Qwen (second from left) incorrectly removes multiple bow ties and introduces inconsistencies. Nano-banana (third from left) also struggles with consistency.

3. Stronger ID Consistency

A core challenge in portrait editing is maintaining identity. Our model excels here as well. Whether changing a hairstyle or adjusting an expression, the model skillfully preserves the person's core features.

ID Consistency Comparison Top Row (Turn head): Our model (right) maintains ID and background consistency, unlike competitors. Middle Row (Smile): Our model (right) correctly follows the prompt while preserving ID, avoiding distortions seen in others. Bottom Row (Change background): Our model (right) excels at preserving the subject's ID and appearance during a background swap.

See More Editing Consistency in Action:

An Honest Look: Where We Can Still Improve

Despite the leap forward, challenges remain:

Large pose changes (e.g., standing → running) need more reliability.
Multi-step or compound instructions require better parsing and execution.
Instruction diversity support needs expansion.

These are our next milestones.

Takeaway: The Next Catalysts Are Out There

From 16% to 72.4% — this wasn’t driven by a massive architecture overhaul or billion-image datasets.

It came from one change in task design.

The lesson: Instead of gluing capabilities together after the fact, find naturally cooperative tasks — where solving the problem requires multiple abilities to mesh seamlessly.

“Segmentation-as-editing” is just the first example.

We suspect 3D understanding, video generation, and other domains have their own hidden catalysts, waiting to be discovered.

At last, AI’s left and right hands have learned to high-five.

And this is only the overture.

Try out our open-source model Ming-lite-omni 1.5 on our GitHub Page / Demo Page. Please star our repo if you like it!

Introducing Ring-lite-2507

August 5, 2025 · 7 min read

inclusionAI

Ant Group

📖 Technical Report | 🤗 Hugging Face｜ 🤖 ModelScope

Overview

We present Ring-lite-2507, an upgraded version of our previously released lightweight reasoning model, Ring-lite (2506). Built upon a 16.8B Mixture-of-Experts (MoE) large language model with 2.75B activated parameters, Ring-lite-2507 further advances its reasoning capabilities while demonstrating superior performance across a comprehensive range of LLM benchmarks, including general text understanding, alignment, coding, logical, and agentic tasks. Thanks to our innovative and robust reinforcement learning training pipeline, Ring-lite-2507 distinguishes itself from the latest public dense models under 10B parameters by offering competitive performance across various tasks, despite activating only 1/3 of their parameter size.

To address the optimization instability of MoE RL training, we propose a novel approach, Constrained Contextual Computation Policy Optimization(C3PO), which enhances training stability and improves computational throughput via algorithm-system co-design. Additionally, we systematically investigate the dynamic relationship between long chain-of-thought SFT and RL training. Rather than relying solely on validation metrics, we explore optimal strategies for selecting the suitable fine-tuned model for RL scaling, yielding superior performance-efficiency trade-offs in our RL training pipeline. Last, we develop a two-stage training paradigm to harmonize multi-domain data integration, enhancing reasoning ability while effectively improving performance across various downstream general tasks.

Highlights

🚀 Superior performance across tasks: Ring-lite-2507 demonstrates outstanding performance across both reasoning and general tasks;
🔥 Only 2.75B activated parameters: Ring-lite-2507 is built upon a Mixture-of-Experts (MoE)-based large language model with only 2.75 billion activated parameters;
⛓️‍💥 Algorithm-system co-design: We proposed novel C3PO approach and employ token efficiency to improve training stability and effectiveness;
🔍 Publicly available: We fully release our training recipe and model weights.

Evaluation

We conduct a comprehensive evaluation of our models across two main domains: reasoning and general. We utilize a diverse set of public benchmarks, organized according to the specific aspects they measure.

Knowledge Understanding

Benchmark	Ring-lite-2507	Ring-lite-2506	Qwen3-8B-Thinking
MMLU-Pro (EM)	72.50	63.44	72.56
GPQA-Diamond (Pass@1)	69.35	63.51	62.00
SuperGPQA (EM)	40.05	13.97	40.36
Phybench (Pass@1)	28.51	29.19	22.14

Math

Benchmark	Ring-lite-2507	Ring-lite-2506	Qwen3-8B-Thinking
MATH-500 (Pass@1)	97.95	96.80	97.30
CNMO 2024 (Pass@1)	75.09	77.26	74.57
AIME 2024 (Pass@1)	79.79	79.00	74.90
AIME 2025 (Pass@1)	72.92	69.50	67.19
LiveMathBench (Pass@1)	83.37	85.08	81.90
TheoremQA (Pass@1)	70.00	70.19	68.81
OlympiadBench (math) (Pass@1)	80.64	82.86	80.20

Coding

Benchmark	Ring-lite-2507	Ring-lite-2506	Qwen3-8B-Thinking
LiveCodeBench(2408-2505) (Pass@1)	60.35	59.53	55.12
Codeforces(Percentile) (Pass@1)	1830	1673	1580
Codeforces(Rating)	92.16	88.00	79.44

Reasoning & Agentic

Benchmark	Ring-lite-2507	Ring-lite-2506	Qwen3-8B-Thinking
DROP (zero-shot F1)	89.27	60.21	87.13
BBH (EM)	88.65	50.84	87.30
ARCPrize (Pass@1)	19.00	3.12	3.88
MuSR (EM)	77.19	66.77	76.92
BFCL_Live (Pass@1)	74.81	66.76	75.99

Alignment

Benchmark	Ring-lite-2507	Ring-lite-2506	Qwen3-8B-Thinking
IFEval (Prompt Strict)	84.66	54.34	85.40
AlignBench v1.1(gpt-4.1)	80.90	69.60	74.70
FoFo (gpt-4-turbo)	85.02	67.81	81.93
ArenaHard (gpt-4.1)	88.85	56.12	86.14

Constrained Contextual Computation Policy Optimization(C3PO)

We introduce Constrained Contextual Computation Policy Optimization(C3PO), an innovative token-level optimization framework designed to mitigate training instability while enhancing throughput consistency. Different from sampling-level filtering, C3PO operates at the token level by sampling tokens to form a token-level global batch, each training step maintains consistent token input to optimizer, which results in reduced gradient variance and consequently achieving stable optimization.

C3PO

Balancing Token efficiency between Distillation and RL

While distillation is effective, we find that it requires more training tokens to achieve comparable performance than RL training. Furthermore, we observe that varying the number of training epochs for the distilled model significantly influences the trend of entropy loss, thereby affecting the exploration scope for RL. Our experiments show that increasing the number of SFT training epochs leads to a rapid collapse in entropy, whereas insufficient SFT training inevitably results in inferior performance. To systematically quantify the choice of optimal SFT epoch, we employ token efficiency to determine the suitable checkpoint for RL scaling.

Training Data

To ensure a high-quality training dataset for reinforcement learning, we established a comprehensive and meticulous data curation pipeline. This pipeline encompasses several key stages, such as data cleansing, answer verification, and data annotation, all designed to thoroughly decontaminate the data and ensure it is both suitable and informative for RL training.

Data Pipeline

Training Pipeline

Reasoning RL

Compared to our previously released Ring-lite-2506, we expanded our reasoning dataset by incorporating more challenging math, coding, and STEM data. Specifically, we adopted 67K math problems, 32K coding problems, and 9.9K scientific problems for reasoning RL training. In addition, we amplified our reasoning dataset by including more than 19K logical games, such as ARC-AGI, Countdown, Sudoku, AlphaMaze, etc. For each type of problem, we specifically designed suitable reward functions to ensure our training examples are verifiable.

General RL

Apart from reasoning tasks, our Ring-lite-2507 has significantly expanded the collection of general datasets for RL training. Our general RL training does not compromise performance on reasoning tasks; instead, it enhances overall text understanding across a broad range of general benchmarks.

Our general RL training incorporates a variety of tasks, including instruction following, question answering, text summarization, and more. For open-ended questions, we employ a robust reward model to assign appropriate scores. Additionally, we have integrated a rule-based verifier to handle problems that can be easily validated, such as instruction-following tasks.

Citation

@misc{lingteam2025ringlitescalablereasoningc3postabilized,
      title={Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs},
      author={Ling Team and Bin Hu and Cai Chen and Deng Zhao and Ding Liu and Dingnan Jin and Feng Zhu and Hao Dai and Hongzhi Luan and Jia Guo and Jiaming Liu and Jiewei Wu and Jun Mei and Jun Zhou and Junbo Zhao and Junwu Xiong and Kaihong Zhang and Kuan Xu and Lei Liang and Liang Jiang and Liangcheng Fu and Longfei Zheng and Qiang Gao and Qing Cui and Quan Wan and Shaomian Zheng and Shuaicheng Li and Tongkai Yang and Wang Ren and Xiaodong Yan and Xiaopei Wan and Xiaoyun Feng and Xin Zhao and Xinxing Yang and Xinyu Kong and Xuemin Yang and Yang Li and Yingting Wu and Yongkang Liu and Zhankai Xu and Zhenduo Zhang and Zhenglei Zhou and Zhenyu Huang and Zhiqiang Zhang and Zihao Wang and Zujie Wen},
      year={2025},
      eprint={2506.14731},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.14731},
}

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

July 11, 2025 · 6 min read

inclusionAI

Ant Group

📖 Technical Report | 🤗 Hugging Face｜ 🤖 ModelScope

Introduction

We introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals. This combination of curated data and advanced training allows M2-Reasoning-7B to set a new state-of-the-art (SOTA) across 8 benchmarks, showcasing superior performance in both general and spatial reasoning domains.

📌 Updates

[2025.07.14] 🔥 Our Technical Report is in public on arxiv.
[2025.07.11] 🔥 We release M2-Reasoning on 🤗 Hugging Face and 🤖 ModelScope.

Key Features

A High-quality Data Construction Pipeline: We design and implement a multi-stage data synthesis and curation pipeline that generates vast amounts of reasoning data.
A Dynamic Multi-Task Training Strategy: We propose a sophisticated training strategy that effectively handles data heterogeneity. It features step-wise dynamic optimization to mitigate conflicts between different data sources and a task-specific reward formulation to provide tailored incentive signals.
Unified General and Spatial Reasoning Model: We propose M2-Reasoning-7B, an MLLM uniquely engineered for both abstract and spatial reasoning. Extensive evaluations on 8 distinctbenchmarks demonstrate that, by leveraging our custom data and training pipelines, M2-Reasoning establishes new state-of-the-art (SOTA) results across both general and spatial reasoning domains.

Evaluation

We conduct a comprehensive evaluation of our models across two key domains: general and spatial reasoning. Our evaluation utilizes a diverse set of public benchmarks, grouped by the primary capability they measure:

General Reasoning (Mathematical & Logical): To evaluate this capability, we employ six benchmarks: MathVista, MathVision, MathVerse, DynaMath, WeMath, and LogicVista.

Models	MathVista	MathVision	MathVerse	DynaMath	WeMath	LogicVista	Avg. (Δ)
Base-Scale General Models
InternVL3-8B	70.5	30.0	38.5	25.7	39.5	44.5	41.4
InternVL3-9B	69.0	29.3	37.9	25.1	34.8	49.0	40.8
Qwen2.5-VL-7B	68.1	25.4	41.1	21.8	36.2	47.9	40.1
MUG-U-7B	74.8	26.1	35.4	17.2	26.5	39.8	36.6
SAIL-VL-1.6-8B	74.2	23.2	33.4	14.0	29.6	41.4	36.0
Base-Scale Reasoning Models
WeThink-VL-7B	71.6	26.0	44.2	24.8	48.0	51.2	44.3 (+4.2)
Taichu-VLR-7B	72.3	27.1	46.7	23.0	44.0	48.3	43.6
VLAA-Thinker-7B	68.0	26.4	48.2	22.4	41.5	48.5	42.5 (+2.4)
URSA-8B-PS-GRPO	67.8	31.8	41.5	22.4	38.3	44.7	41.1 (+8.2)
Ovis2-8B	71.8	25.9	42.3	20.4	27.2	39.4	37.8
Our Models
Base Model	70.2	25.9	30.5	20.2	27.2	37.8	35.5
M2-Reasoning-CI-7B	71.7	29.2	42.1	25.0	42.8	46.8	42.9 (+7.4)
M2-Reasoning-7B	75.0	31.5	44.7	26.8	41.8	50.0	45.0 (+9.5)

Spatial Reasoning: We assess this skill using 2 benchmarks: CV-Bench and VSI-Bench

CV-Bench:

Models	Count	Relation	Depth	Distance	Avg.
Large-Scale Models
GPT-4O	65.9	85.7	87.8	78.2	78.9
Gemini-1.5-pro	70.4	85.2	82.4	72.8	77.4
Base-Scale Models
InternVL3-8B	74.0	90.6	84.3	81.0	82.0
Qwen2.5-VL-7B-Instruct	65.2	86.6	70.6	79.8	75.0
LLava-NEXT-Video-7B	59.3	77.0	71.3	54.7	65.2
Our Models
M2-Reasoning-7B	66.6	92.8	89.3	84.3	82.3

VSI-Bench:

	OC	AD	OS	RS	RDs	RDr	RP	AO	Avg.
Large-Scale Models
Gemini-1.5-pro	56.2	30.9	64.1	43.6	51.3	46.3	36.0	34.6	45.4
GPT-4O	46.2	5.3	43.8	38.2	37.0	41.3	31.5	28.5	34.0
Base-Scale Models
InternVL3-8B	68.1	39.0	48.4	33.6	48.3	36.4	27.3	35.4	42.1
Video-R1-7B	-	-	-	-	-	-	-	-	37.1
Qwen2.5-VL-7B-Instruct	37.7	20.1	49.7	37.4	38.5	40.4	31.4	32.0	35.9
LLava-NeXT-Video-7B	48.5	14.0	47.8	24.2	43.5	42.4	34.0	30.6	35.6
Our Models
M2-Reasoning-7B	41.0	34.0	60.9	55.4	40.7	47.3	29.9	28.8	42.3

Model Downloads

You can download the model from both Hugging Face and ModelScope.

If you're in mainland China, we strongly recommend you to download our model from ModelScope.

Example Usage

The basic environment is python=3.10, torch=2.6.0+cu124, transformers=4.49.0

We provide a small example on the usage of this repo.

import os
import torch

from transformers import (
    AutoProcessor,
    AutoTokenizer,
)

import warnings
import argparse
from modeling_bailing_qwen2_5 import Bailing_qwen2_5NativeForConditionalGeneration
from processing_bailing_qwen2_5 import Bailing_qwen2_5Processor

warnings.filterwarnings("ignore")

class BailingMMInfer:
    def __init__(self,
        model_name_or_path,
        device="cuda",
        max_pixels=None,
        min_pixels=None,
        video_max_pixels=768 * 28 * 28,
        video_min_pixels=128 * 28 * 28,
        generation_config=None
    ):
        super().__init__()
        self.model_name_or_path = model_name_or_path

        self.device = device

        self.device_map = device

        self.video_max_pixels = video_max_pixels if video_max_pixels is not None else 768 * 28 * 28
        self.video_min_pixels = video_min_pixels if video_min_pixels is not None else 128 * 28 * 28

        self.model, self.tokenizer, self.processor = self.load_model_processor()
        if max_pixels is not None:
            self.processor.max_pixels = max_pixels
        if min_pixels is not None:
            self.processor.min_pixels = min_pixels
        if generation_config is None:
            generation_config = {
                "num_beams": 1,
                "do_sample": True,
                "temperature": 0.9
            }

        self.generation_config = generation_config


    def load_model_processor(self):

        model = Bailing_qwen2_5NativeForConditionalGeneration.from_pretrained(
            self.model_name_or_path,
            torch_dtype=torch.bfloat16,
            device_map=self.device_map,
            _attn_implementation="flash_attention_2"
        ).eval()

        tokenizer = AutoTokenizer.from_pretrained(self.model_name_or_path, add_bos_token=True, trust_remote_code=True)
        processor = Bailing_qwen2_5Processor.from_pretrained(self.model_name_or_path, trust_remote_code=True)

        return model, tokenizer, processor

    def generate(self, messages, max_new_tokens=512):
        text = self.processor.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True, use_system=True
        )

        image_inputs, video_inputs = self.processor.process_vision_info(messages)


        inputs = self.processor(
            text=[text],
            images=image_inputs,
            videos=video_inputs,
            return_tensors="pt",
        )
        # print(inputs)
        print(self.tokenizer.decode(inputs['input_ids'][0]))

        inputs = inputs.to(self.device)

        for k in inputs.keys():
            if k == "pixel_values" or k == "pixel_values_videos":
                inputs[k] = inputs[k].to(dtype=torch.bfloat16)

        with torch.no_grad():
            generated_ids = self.model.generate(
                inputs,
                max_new_tokens=max_new_tokens,
                eos_token_id=self.processor.tokenizer.eos_token_id,
                **self.generation_config,
            )

        generated_ids_trimmed = [
            out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
        ]

        output_text = self.processor.batch_decode(
            generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False
        )[0]

        return output_text

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--model_name_or_path', type=str, default="inclusionAI/M2-Reasoning")
    parser.add_argument('--max_pixels', type=int, default=401408)
    parser.add_argument('--min_pixels', type=int, default=401408)
    parser.add_argument('--max_new_tokens', type=int, default=4096)

    args = parser.parse_args()

    device = "cuda" if torch.cuda.is_available() else "cpu"
    # model_name_or_path = os.path.join(args.input_dir, args.model_name_or_path)
    bailing2 = BailingMMInfer(
        args.model_name_or_path,
        device=device,
        max_pixels=args.max_pixels,
        min_pixels=args.min_pixels
    )

    messages = [
        {
            "role": "system",
            "content": [
                {"type": "text", "text": "You are a helpful assistant. When the user asks a question, your response must include two parts: first, the reasoning process enclosed in <think>...</think> tags, then the final answer enclosed in <answer>...</answer> tags. The critical answer or key result should be placed within \\boxed{}."}]},
        {
            "role": "user",
            "content": [
                {"type": "image", "image": "./assets/example1.png"},
                {"type": "text", "text": "\nQuestion:\n\nRhombus $QRST$ has an area of 137.9 square meters. If $RT$ is 12.2 meters, find $QS$.\nA. 11.3\nB. 22.4\nC. 22.6\nD. 25.6"},
            ],
        },
    ]
    output_text = bailing2.generate(messages, max_new_tokens=args.max_new_tokens)
    print(output_text)



'''
[Output]:

<think>
To find the length of \( QS \) in the rhombus \( QRST \), we can use the formula for the area of a rhombus, which is given by:

\[
\text{Area} = \frac{1}{2} \times d_1 \times d_2
\]

where \( d_1 \) and \( d_2 \) are the lengths of the diagonals. In this problem, we are given:
- The area of the rhombus is 137.9 square meters.
- One of the diagonals,

ABench: An Evolving Open-Source Benchmark

July 8, 2025 · 2 min read

inclusionAI

Ant Group

GITHUB

🌟 Overview

ABench is an evolving open-source benchmark suite designed to rigorously evaluate and enhance Large Language Models (LLMs) on complex cross-domain tasks. By targeting current model weaknesses, ABench provides systematic challenges in high-difficulty specialized domains, including physics, actuarial science, logical reasoning, law, and psychology.

🎯 Core Objectives

Address Evaluation Gaps: Design high-differentiation assessment tasks targeting underperforming question types
Establish Unified Standards: Create reliable, comparable benchmarks for multi-domain LLM evaluation
Expand Capability Boundaries: Drive continuous optimization of knowledge systems and reasoning mechanisms through challenging innovative problems

📊 Dataset Release Status

Domain	Description	Status
Physics	500 university/competition-level physics problems (400 static + 100 dynamic parametric variants) covering 10+ fields from classical mechanics to modern physics	✅ Released
Actuary	Curated actuarial exam problems covering core topics: probability statistics, financial mathematics, life/non-life insurance, actuarial models, and risk management	✅ Released
Logic	High-differentiation logical reasoning problems from authoritative tests (LSAT/GMAT/GRE/SBI/Chinese Civil Service Exam)	🔄 In Preparation
Psychology	Psychological case studies and research questions (objective/subjective) evaluating understanding of human behavior and theories	🔄 In Preparation
Law	Authoritative judicial exam materials covering core legal domains: criminal/civil/administrative/procedural/international law	🔄 In Preparation

Open Source LLM Development 2025: Landscape, Trends and Insights

June 11, 2025 · 10 min read

inclusionAI

Ant Group

Originally published on Medium by Ant Open Source.

「AI Surpasses Cloud Native as the Most Influential Tech Domain」

According to OpenRank data from OpenDigger, AI surpassed Cloud Native in 2023 to become the most influential technology domain in terms of community collaboration on GitHub. AI's total influence score overtook Frontend technologies in 2017, accelerated post-2022, and surpassed the declining Cloud Native in 2023 to claim the top spot.

AI surpasses Cloud Native

The LLM Development Ecosystem: A Snapshot

LLM Development Landscape

https://antoss-landscape.my.canva.site

In February 2025, DeepSeek sparked a surge in the LLM development ecosystem. GitHub's Weekly Trending List reached a peak where 94% of the listed repositories were AI-related. This ecosystem is incredibly new and evolving fast — over the past three months, 60% of LLM-related projects that appeared on GitHub Trending were emerged after 2024, and nearly 21% were created in just the last six months.

We build the landscape by first selecting well-known AI projects (e.g., PyTorch, LangChain, vLLM) as seed nodes. By analyzing developer collaboration relationships across "related" GitHub projects, we explored multiple facets of the ecosystem. We rely on the OpenRank influence metric developed by X-lab at East China Normal University — only projects with an average monthly OpenRank score exceeding 10 in year 2025 are included.

As of May 2025, the Open Source LLM Development Landscape 2025 includes 135 projects across 19 technical domains, spanning both Agent application layers and model infrastructure layers.

Below are the details of projects ranked in the Top 20 of OpenRank:

Top 20 by OpenRank

By stack ranking the year-over-year absolute changes in OpenRank between 2024 and 2025, we converged on 3 key observations:

Model Training Frameworks: PyTorch remains the undisputed leader. Baidu's PaddlePaddle saw a 41% drop in OpenRank compared to the previous year.
Efficient Inference Engines: The high-performance inference engines vLLM and SGLang have undergone rapid iterations, ranking first and third in OpenRank value growth. Their superior GPU inference performance made them the most popular choices for enterprise-level LLM deployment.
Low-Code Application (Agent) Development Frameworks: Agent platforms like Dify and RAGFlow, which integrate RAG-based knowledge retrieval, are experiencing rapid growth as they meet the red-hot demand for quickly building AI applications. Notably, both platforms are strong projects emerging from China's developer community.

After observing over 100 open-source projects, we've reached a pivotal point to make a bold claim: the LLM development ecosystem operates like a real-world Hackathon — developers, empowered by AI, now operate as "super individuals" to rapidly build open-source projects around trending topics, with cycles of rapid creation and dissolution driven by speed and iteration.

Key hackathon observations:

1. Developers keep building OSS clones for rapid adoption

When closed-source projects like Devin, Perplexity, and Manus brought shockwaves to the industry, developers quickly replicated open-source versions:

Devin & OpenDevin: In March 2024, Xingyao Wang (PhD candidate at UIUC) launched OpenDevin. Within a month, its OpenRank skyrocketed to 190. The project was rebranded as OpenHands and evolved into All Hands AI.
Perplexity & Perplexica: Independent developer ItzCrazyKns created Perplexica in 2024 as an open-source alternative. It amassed 22K GitHub stars but OpenRank plateaued around 25.
Manus & OpenManus: In March 2025, as Manus went viral, DeepWisdom pulled off a "3-hour replication" with OpenManus, garnering 8K stars on its first day.

2. Ephemeral technical experiments often end up in the AI graveyard

Out of 5,079 AI tools recorded by Dang AI, 1,232 have been archived/abandoned. Dang AI even created an "AI Graveyard." We've curated an "Open-Source AI Graveyard" for projects that gained massive attention upon launch but became inactive — including BabyAGI (April 2023) and Swarm (OpenAI, formally discontinued March 2025).

3. Model capabilities are reshaping application scenarios

The decline of AI Search projects: The generalization of model capabilities (GPT-4, Gemini 2.0) is squeezing the market for specialized search tools like Morphic.sh and Scira.
The rise of AI Coding projects: Claude 3.7 Sonnet's prowess in coding ushered in "Vibe Coding." IDE plugins like Continue and Cline are thriving open-source options, each with over 3,000 community contributors and steadily rising OpenRank scores.

4. Dynamic competition across ecosystem niches

Divergent trajectories of Agent Frameworks: Application platforms like Dify diverged sharply from development frameworks like LangChain. Special mention: DB-GPT, an open-source project initiated by Ant Group, integrates AI application development into big data application scopes.
The rise of Reinforcement Learning: DeepSeek-R1's "Aha Moment" demonstrated RL's effectiveness as a post-training approach. Frameworks like Verl and OpenRLHF have seen remarkable growth. In February, inclusionAI fully open-sourced their RL framework AReaL, designed to train large inference models that anyone can reproduce.
The blurring of Technical Boundaries: Vector databases, once standalone, now compete with traditional big data systems (e.g., OceanBase adding vector storage support) while maintaining a delicate ecological equilibrium.

Now, the Technical Trends in LLM Open-Source Development Ecosystems

We observed and summarized 7 relatively clear technical trends including emerging paradigms such as Agent Frameworks, AI-native communication protocols like MCP, and Coding Agents at the application layer.

7 Technical Trends

1. The Agent Frameworks Boom Diverged in 2025

From 2023 to 2024, "all-in-one" frameworks like LangChain dominated with their pioneering task orchestration capabilities. A huge number of new Agent development frameworks emerged, many focusing on specific features such as tool calling, RAG integration, long-context memory, or ReAct planning.

By the second half of 2024, only a few new frameworks entered the ecosystem. As the initial hype faded, early market leaders like LangChain were gradually declining due to steep learning curves.

Entering 2025, the market showed signs of divergence: platforms like Dify and RAGFlow became extremely popular by offering low-code workflows and enterprise-grade service deployments. In contrast, development frameworks like LangChain and LlamaIndex have been steadily losing ground.

Dify has accurately captured enterprise user needs — offering intuitive visual workflow orchestration, comprehensive enterprise-grade security, and significantly lowering the technical barrier for amateur users.

2. Standard Protocol Layer: The Strategic Battleground

2022: Wild West Era — ad-hoc prompt engineering for tool interaction.
2023: OpenAI's GPT4-0613 introduced Function Calling with standardized API.
2024: Anthropic's Model Context Protocol (MCP), open-sourced November 2024, standardized agent-tool communication. By Q1 2025, MCP became the de facto standard.
2025: Protocol "War" begins:
- April: Google open-sourced the Agent2Agent (A2A) protocol for communication between multiple agents.
- May: CopilotKit launched the Agent-User Interaction (AG-UI) protocol with 2.2K GitHub stars in its first week.

The emergence of MCP, A2A, and AG-UI signals LLM applications evolving toward a microservices architecture. The open-source ecosystem will become the battlefield of both standards and their reference designs.

3. The Irresistible Vibe Coding Software Development Paradigm

When Andrej Karpathy introduced the term "vibe coding," it seemed to capture "The Trend" in the upcoming productivity domain. Our research reveals a market pattern:

Major tech companies have rapidly entered AI coding primarily with closed-source offerings: GitHub Copilot, Amazon Q Developer, Huawei's CodeArts Snap, Alibaba's Tongyi Lingma, ByteDance's Trae, and Ant Group's CodeFuse.

Startup ventures and small teams have demonstrated remarkable agility. A prime example is Continue's "continuedev," which gained substantial attention through lean operations and flexible innovation. The sector's potential was endorsed by OpenAI's reported $3 billion acquisition offer for Windsurf.

AI coding tools are advancing beyond basic snippet generation to tackle full-scale development workflows, though substantial challenges remain in semantic validation, multi-language coordination, and security-sensitive code generation.

4. The Shifting Boundaries of Vector Indexing and Storage

The evolution of vector databases can be described as a journey "from explosive hype to rational consolidation." Around February 2023, projects like Qdrant and Chroma saw an unprecedented surge, amassing over 5,000 GitHub stars. However, this initial frenzy failed to sustain long-term momentum.

Several factors contributed to equilibrium:

Closed-source commercial competitors like Pinecone demonstrated strong product capabilities.
Traditional databases (PostgreSQL, MongoDB Atlas, ElasticSearch) introduced vectorization via plugins like pgvector.
The OpenCore model prioritizes ecosystem expansion over community metrics.

Despite these pressures, large-scale enterprise demands for cloud-native scalability and compliance still favor specialized vector databases. MilVus, under neutral LF AI & Data stewardship, has consistently maintained a stable leading position.

5. The Evolution of Multimodal Data Governance in the Age of LLMs

In data lake table formats, Apache Iceberg, Apache Hudi, Apache Paimon, and Delta Lake have formed a "quadropoly." Iceberg has solidified its position as the universal framework, while Hudi and Paimon excel in real-time incremental processing.

The metadata governance and data catalog space sees OpenMetadata and DataHub maintaining leadership, with newcomers like Apache Gravitino and Unity Catalog emerging as potential disruptors. These tools are expanding to include unstructured data and AI assets.

6. The Ongoing Horse Racing in Model Serving and Inference

Three critical factors have emerged as core deal-makers or deal-breakers: inference efficiency, resource utilization, and deployment flexibility. The Top 10 ranking list reshuffles constantly, with contenders like Tsinghua University's KTransformers and NVIDIA's Dynamo continually challenging the status quo.

A potential duopoly is forming: vLLM and SGLang, currently the two most prominent inference engines in the LLM space. In Q1 2025, vLLM's OpenRank grew at 17%, while SGLang surged to 31%.

This duel carries notable academic pedigree: UC Berkeley, birthplace of Spark and Ray, again demonstrates its open-source alchemy. vLLM originated from Berkeley's SkyLab; SGLang from LMSYS, the multi-university research consortium that created Chatbot Arena.

Other notable engines:

Ollama & llama.cpp: The lightweight powerhouses for edge inference and on-premise deployment
KTransformers: Enabled running full 671B parameter models (DeepSeek-R1/V3) on consumer hardware with 3–28x speedups, triggering a 34x OpenRank spike

7. The PyTorch-Centric Training Ecosystem

PyTorch has undeniably become the dominant force and de facto standard in LLM development. Its modular, lightweight design propelled it past TensorFlow in 2020, while TensorFlow, MXNet, and Caffe faded into obsolescence.

In September 2022, Meta transferred PyTorch's governance to the Linux Foundation, establishing the PyTorch Foundation. Through PyTorch's nearly overwhelming ecosystem gravitational pull, this sub-foundation has grown into a powerful umbrella organization:

March 2025: Inference engine SGLang joined the PyTorch ecosystem
May 2025: vLLM and distributed training platform DeepSpeed joined the PyTorch Foundation

Community data still reveals Meta's substantial behind-the-scenes influence: the repository's top contributors are all identifiable Meta staff, and over 9,000 pull requests (9% of all PRs) carry the "fb-exported" label.

Conclusion

Ant Group's Open Source team initiated this landscape project to understand the full picture of the LLM development ecosystem, including emerging trends and cutting-edge popular projects. One of our missions is to leverage insights from the open-source community to guide Ant Group's architectural and technological decisions.

This report reflects Ant Group's perspective as a technology enterprise, utilizing X-lab's OpenRank evaluation metrics alongside extensive consultations with technical experts and open-source community developers.

Full Author List: Xiaoya Xia, Sikang Bian, Chao Dong, Xu Wang (AntOSS) Shengyu Zhao, Fanyu Han, Jiaheng Peng, Zhen Zhang, Wei Wang (X-lab)

More on GitHub: https://github.com/antgroup/llm-oss-landscape

Opening​

Signals From Platforms​

The Agentic AI Ecosystem Architecture​

Agent Infra: Redefining How Software Is Used​

Model Infra: The Main Job Is Supplying Tokens at Scale​

Model: There Is No Single Winner​

Trends and Project Stories​

Description Signals: Projects Are Rewriting Their Self-Introductions​

README Evidence: Projects Say What They Are Not​

Developers and AI Tools​

Who Is Taking Part in the Agentic AI Ecosystem?​

Carbon-Based Definers and Silicon-Based Executors​

Software Will Keep Being Rewritten, But Open Source Remains Irreplaceable​

Inclusive AGI: Intelligence Should Not Be a Privilege for the Few​

Notes on Data Scope​

Agentic Ecosystem in 2026​

Technical Trends from Popular Projects​

From Context Management to Complexity Harness​

Software Development Shifts from Human-Centric to Agent-Centric​

The Agent "Evolution" Proposition—Lobsters, Cats, and Bees​

Model "Big Three" Each Build Complete Ecosystem Tools​

Project Story: One-Person Company? Zero-Person Company!​

The AI Era's "Developers and AI"​

Developers: Concentrated in Head Projects, But from Diverse Backgrounds​

Developers: Young and Cross-Disciplinary, "Builders," "Founders," and "Digital Nomads"​

Developers and AI: Replacement or Symbiosis? Let's Look at the Numbers​

2026 Coding Agent Landscape: Prompting, Generation, Review, to Requirements Management​

Finally: Amidst AI FOMO, Openness, Sharing, and Collaboration Remain Developers' Spiritual Home​

First, what is LLM inference?​

From academic innovation to a community-driven open-source standard-bearer​

Origins: a continuous current of innovation​

Development, refactors, and fierce competition​

Moving Forward in the Ecosystem​

Global Developer Contribution​

Large Models Landscape 2025​

From Landscape to Tech Trends​

Large Models Development Keywords​

Top 10 Open Source Projects​

Redefining Open Source in the LLM Era​

Shifting Trends Across Technical Domains​

Projects on The Brink List​

100 Days of Change and Continuity​

New Fields and Projects Entering the Spotlight​

What Hasn't Changed: Rise, Fall, and the Cycle of Momentum​

Core Tech Trends: Serving, Coding, and Agents​

Serving: Making Models Truly Usable​

Coding: The New Developer Vibe​

Agent: Building Toward AGI​

🚀 Technical Highlights​

The Challenge: The Inverse Nature of Seeing and Drawing​

Why Previous Approaches Fell Short​

Our Solution: Ming-UniVision and MingTok​

The Core Design: A Three-Stage Architecture to Reconcile Competition​

The Breakthrough: A Fundamental Leap in Efficiency​

1. A Revolution in Training: >3.5x Faster Convergence​

2. A Revolution in Interaction: Goodbye to the "Pixel Round-Trip"​

Conclusion and The Road Ahead​

Get Involved​

The Hype and the Hidden Question​

Understanding and Generation: Two Hands, Often Out of Sync​

The Struggle: 16% Segmentation and Out-of-Control Generation​

The “Aha” Moment: Dressing Segmentation in Color​

Why This Works: The Hidden Catalyst​

The Numbers: From 16% to 72.4% — and Beyond​

1. SOTA-level Segmentation​

2. Sharper, More Controllable Editing​

3. Stronger ID Consistency​

An Honest Look: Where We Can Still Improve​

Takeaway: The Next Catalysts Are Out There​

Overview​

Evaluation​

Knowledge Understanding​

Math​

Coding​

Reasoning & Agentic​

Alignment​

Constrained Contextual Computation Policy Optimization(C3PO)​

Balancing Token efficiency between Distillation and RL​

Training Data​

Training Pipeline​

Opening

Signals From Platforms

The Agentic AI Ecosystem Architecture

Agent Infra: Redefining How Software Is Used

Model Infra: The Main Job Is Supplying Tokens at Scale

Model: There Is No Single Winner

Trends and Project Stories

Description Signals: Projects Are Rewriting Their Self-Introductions

README Evidence: Projects Say What They Are Not

Developers and AI Tools

Who Is Taking Part in the Agentic AI Ecosystem?

Carbon-Based Definers and Silicon-Based Executors

Software Will Keep Being Rewritten, But Open Source Remains Irreplaceable

Inclusive AGI: Intelligence Should Not Be a Privilege for the Few

Notes on Data Scope

Agentic Ecosystem in 2026

Technical Trends from Popular Projects

From Context Management to Complexity Harness

Software Development Shifts from Human-Centric to Agent-Centric

The Agent "Evolution" Proposition—Lobsters, Cats, and Bees

Model "Big Three" Each Build Complete Ecosystem Tools

Project Story: One-Person Company? Zero-Person Company!

The AI Era's "Developers and AI"

Developers: Concentrated in Head Projects, But from Diverse Backgrounds

Developers: Young and Cross-Disciplinary, "Builders," "Founders," and "Digital Nomads"

Developers and AI: Replacement or Symbiosis? Let's Look at the Numbers

2026 Coding Agent Landscape: Prompting, Generation, Review, to Requirements Management

Finally: Amidst AI FOMO, Openness, Sharing, and Collaboration Remain Developers' Spiritual Home

First, what is LLM inference?

From academic innovation to a community-driven open-source standard-bearer

Origins: a continuous current of innovation

Development, refactors, and fierce competition

Moving Forward in the Ecosystem

Global Developer Contribution

Large Models Landscape 2025

From Landscape to Tech Trends

Large Models Development Keywords

Top 10 Open Source Projects

Redefining Open Source in the LLM Era

Shifting Trends Across Technical Domains

Projects on The Brink List

100 Days of Change and Continuity

New Fields and Projects Entering the Spotlight

What Hasn't Changed: Rise, Fall, and the Cycle of Momentum

Core Tech Trends: Serving, Coding, and Agents

Serving: Making Models Truly Usable

Coding: The New Developer Vibe

Agent: Building Toward AGI

🚀 Technical Highlights

The Challenge: The Inverse Nature of Seeing and Drawing

Why Previous Approaches Fell Short

Our Solution: Ming-UniVision and MingTok

The Core Design: A Three-Stage Architecture to Reconcile Competition

The Breakthrough: A Fundamental Leap in Efficiency

1. A Revolution in Training: >3.5x Faster Convergence

2. A Revolution in Interaction: Goodbye to the "Pixel Round-Trip"

Conclusion and The Road Ahead

Get Involved

The Hype and the Hidden Question

Understanding and Generation: Two Hands, Often Out of Sync

The Struggle: 16% Segmentation and Out-of-Control Generation

The “Aha” Moment: Dressing Segmentation in Color

Why This Works: The Hidden Catalyst

The Numbers: From 16% to 72.4% — and Beyond

1. SOTA-level Segmentation

2. Sharper, More Controllable Editing

3. Stronger ID Consistency

An Honest Look: Where We Can Still Improve

Takeaway: The Next Catalysts Are Out There

Overview

Evaluation

Knowledge Understanding

Math

Coding

Reasoning & Agentic

Alignment

Constrained Contextual Computation Policy Optimization(C3PO)

Balancing Token efficiency between Distillation and RL

Training Data

Training Pipeline