Agent Performance
Pass rates on EvoAgentBench ยท 5 domains ยท multiple self-evolution methods
Partial results. More agents, domains, and methods coming soon.
72.4%
Best With Skills
+10.9%
Avg. Improvement
20
Configurations
Filter ยท Agent
Filter ยท Domain
Filter ยท Self-Evolving Methods
Sort by
| # | Agent | Base Model | Domain | Self-Evolving Methods | Without | With Skills | ฮ | Cost | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | OpenClaw | Qwen3.5-397B | Knowledge Work | Human Design | 50.0% | 72.4% | +22.4 | โ 91.8% turns | |
| 2 | OpenClaw | Qwen3.5-27B | Knowledge Work | Human Design | 51.7% | 65.5% | +13.8 | โ 43.8% turns | |
| 3 | OpenClaw | Qwen3.5-397B | Code Implementation | Human Design | 56.4% | 64.1% | +7.7 | โ 6.7% turns | |
| 4 | OpenClaw | Qwen3.5-27B | Code Implementation | Human Design | 53.8% | 64.1% | +10.3 | โ 0% turns | |
| 5 | OpenClaw | Qwen3.5-397B | Software Engineering | Human Design | 26.9% | 61.5% | +34.6 | โ 0.5% turns | |
| 6 | OpenClaw | Qwen3.5-397B | Reasoning & Problem Decomposition | Human Design | 45.0% | 60.0% | +15.0 | โ 2.7% chars | |
| 7 | OpenClaw | Qwen3.5-397B | Knowledge Work | EverOS | 50.0% | 56.9% | +6.9 | โ 24.7% turns | |
| 8 | OpenClaw | Qwen3.5-397B | Code Implementation | EverOS | 56.4% | 56.4% | +0.0 | โ 3.3% turns | |
| 9 | OpenClaw | Qwen3.5-397B | Information Retrieval | Human Design | 32.3% | 55.4% | +23.1 | โ 21.8% turns | |
| 10 | OpenClaw | Qwen3.5-27B | Knowledge Work | EverOS | 51.7% | 55.2% | +3.5 | โ 5.4% turns | |
| 11 | OpenClaw | Qwen3.5-27B | Code Implementation | EverOS | 53.8% | 51.3% | -2.5 | โ 4.8% turns | |
| 12 | OpenClaw | Qwen3.5-397B | Reasoning & Problem Decomposition | EverOS | 45.0% | 49.0% | +4.0 | โ 32.1% chars | |
| 13 | OpenClaw | Qwen3.5-397B | Information Retrieval | EverOS | 32.3% | 43.1% | +10.8 | โ 33.1% turns | |
| 14 | OpenClaw | Qwen3.5-27B | Reasoning & Problem Decomposition | EverOS | 37.0% | 42.0% | +5.0 | โ 6.2% chars | |
| 15 | OpenClaw | Qwen3.5-397B | Software Engineering | EverOS | 26.9% | 38.5% | +11.6 | โ 11.4% turns | |
| 16 | OpenClaw | Qwen3.5-27B | Software Engineering | EverOS | 11.5% | 38.5% | +27.0 | โ 41.2% turns | |
| 17 | OpenClaw | Qwen3.5-27B | Software Engineering | Human Design | 11.5% | 38.5% | +27.0 | โ 62.5% turns | |
| 18 | OpenClaw | Qwen3.5-27B | Information Retrieval | Human Design | 30.8% | 35.4% | +4.6 | โ 14.2% turns | |
| 19 | OpenClaw | Qwen3.5-27B | Information Retrieval | EverOS | 30.8% | 32.3% | +1.5 | โ 4.7% turns | |
| 20 | OpenClaw | Qwen3.5-27B | Reasoning & Problem Decomposition | Human Design | 37.0% | 29.0% | -8.0 | โ 13.0% chars |