SwarmX is a scheduler agent framework for large-scale agentic workflow clusters. Submitted to OSDI 2026 and deployed in Tencent WeChat's production environment, it addresses the critical challenge of efficiently scheduling complex AI agent workflows across hundreds of heterogeneous GPUs and millions of CPU cores.
Key Features
Scheduler Agent Framework — Purpose-built for agentic workflow clusters, supporting intelligent routing and resource allocation for complex workflows.
Heterogeneous GPU Scheduling — Unified scheduling across heterogeneous GPU clusters, adapting to different hardware configurations.
Drift-Robust Stability — Maintains scheduling stability under severe drift conditions in production environments.
Production-Scale Deployment — Validated from 128-GPU benchmarks to million-core CPU + near-thousand GPU production clusters at Tencent WeChat.
Results
- Tail latency improved by 10–60% over SOTA methods (128-GPU benchmark)
- P99 latency reduced by up to 50% in production
- Throughput doubled under the same SLO constraints
- Deployed for Tencent WeChat Hunyuan model serving and WeChat OCR scheduling
Collaborators
Tencent — WeChat Hunyuan model deployment and OCR scheduling
SwarmX 是面向大规模 Agentic Workflow 集群的调度 Agent 框架。投稿 OSDI 2026,已在腾讯微信生产环境落地。解决大规模 AI Agent 工作流在数百块异构 GPU 和百万级 CPU 核心上的高效调度难题。
核心功能
Scheduler Agent 框架 — 专为 Agentic Workflow 集群设计的调度框架,支持复杂工作流的智能路由与资源分配。
异构 GPU 调度 — 支持跨异构 GPU 集群的统一调度,适应不同硬件配置。
鲁棒性与漂移应对 — 在严重漂移条件下保持调度稳定性。
生产级可扩展性 — 从 128 GPU 基准测试扩展到百万 CPU 核心 + 近千 GPU 的真实生产环境。
成果
- 尾延迟对比 SOTA 方法提升 10–60%(128 GPU Benchmark)
- 生产环境 P99 延迟降低最高 50%
- 相同 SLO 下吞吐量翻倍
- 已在腾讯微信混元模型部署和微信 OCR 调度中落地
合作方
腾讯 — 微信混元模型部署调度、微信 OCR 调度
