Agent Performance
Pass rates on EvoAgentBench ยท 5 domains ยท multiple self-evolution methods
Partial results. More agents, domains, and methods coming soon.
65.5%
Best With Skills
+2.3%
Avg. Improvement
80
Configurations
Filter ยท Agent
Filter ยท Domain
Filter ยท Self-Evolving Methods
Sort by
| # | Agent | Base Model | Domain | Self-Evolving Methods | Without | With Skills | ฮ | Cost | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Nanobot | Qwen3.5-397B | Knowledge Work | ReasoningBank | 54.9% | 65.5% | +10.6 | โ 17.6% turns | |
| 2 | OpenClaw | Qwen3.5-397B | Software Engineering | ReasoningBank | 25.0% | 65.4% | +40.4 | โ 89.0% turns | |
| 3 | OpenClaw | Qwen3.5-397B | Code Implementation | GEPA | 46.2% | 64.1% | +17.9 | โ 90.0% turns | |
| 4 | Nanobot | Qwen3.5-397B | Knowledge Work | EverOS | 54.9% | 63.8% | +8.9 | โ 1.2% turns | |
| 5 | Nanobot | Qwen3.5-397B | Knowledge Work | Memento | 54.9% | 62.7% | +7.8 | โ 14.5% turns | |
| 6 | Nanobot | Qwen3.5-397B | Code Implementation | GEPA | 51.3% | 61.5% | +10.2 | โ 65.2% turns | |
| 7 | Nanobot | Qwen3.5-397B | Knowledge Work | GEPA | 54.9% | 60.8% | +5.9 | โ 39.6% turns | |
| 8 | Nanobot | Qwen3.5-27B | Knowledge Work | EverOS | 43.1% | 60.3% | +17.2 | โ 10.9% turns | |
| 9 | Nanobot | Qwen3.5-27B | Software Engineering | EverOS | 38.5% | 57.7% | +19.2 | โ 4.2% turns | |
| 10 | Nanobot | Qwen3.5-397B | Software Engineering | ReasoningBank | 46.2% | 57.7% | +11.5 | โ 12.0% turns | |
| 11 | Nanobot | Qwen3.5-397B | Reasoning & Problem Decomposition | Memento | 53.0% | 55.0% | +2.0 | โ 1484.9% chars | |
| 12 | Nanobot | Qwen3.5-397B | Reasoning & Problem Decomposition | GEPA | 53.0% | 55.0% | +2.0 | โ 1.9% chars | |
| 13 | OpenClaw | Qwen3.5-397B | Knowledge Work | EverOS | 45.1% | 53.4% | +8.3 | โ 4.4% turns | |
| 14 | OpenClaw | Qwen3.5-27B | Code Implementation | GEPA | 46.2% | 51.3% | +5.1 | โ 20.4% turns | |
| 15 | OpenClaw | Qwen3.5-397B | Code Implementation | Memento | 46.2% | 51.3% | +5.1 | โ 8.0% turns | |
| 16 | Nanobot | Qwen3.5-397B | Code Implementation | ReasoningBank | 51.3% | 51.3% | +0.0 | โ 4.3% turns | |
| 17 | Nanobot | Qwen3.5-397B | Reasoning & Problem Decomposition | EverOS | 53.0% | 51.0% | -2.0 | โ 2.8% chars | |
| 18 | OpenClaw | Qwen3.5-397B | Information Retrieval | EverOS | 30.8% | 50.8% | +20.0 | โ 1.5% turns | |
| 19 | OpenClaw | Qwen3.5-27B | Knowledge Work | EverOS | 37.3% | 50.0% | +12.7 | โ 5.5% turns | |
| 20 | OpenClaw | Qwen3.5-27B | Software Engineering | ReasoningBank | 38.5% | 50.0% | +11.5 | โ 5.1% turns | |
| 21 | OpenClaw | Qwen3.5-397B | Reasoning & Problem Decomposition | EverOS | 48.0% | 50.0% | +2.0 | โ 18.0% chars | |
| 22 | OpenClaw | Qwen3.5-397B | Software Engineering | GEPA | 25.0% | 50.0% | +25.0 | โ 59.8% turns | |
| 23 | Nanobot | Qwen3.5-27B | Software Engineering | Memento | 38.5% | 50.0% | +11.5 | โ 24.9% turns | |
| 24 | Nanobot | Qwen3.5-27B | Software Engineering | GEPA | 38.5% | 50.0% | +11.5 | โ 29.2% turns | |
| 25 | Nanobot | Qwen3.5-397B | Software Engineering | Memento | 46.2% | 50.0% | +3.8 | โ 16.2% turns | |
| 26 | OpenClaw | Qwen3.5-397B | Knowledge Work | GEPA | 45.1% | 49.0% | +3.9 | โ 5.9% turns | |
| 27 | Nanobot | Qwen3.5-397B | Reasoning & Problem Decomposition | ReasoningBank | 53.0% | 49.0% | -4.0 | โ 30.0% chars | |
| 28 | OpenClaw | Qwen3.5-27B | Software Engineering | EverOS | 38.5% | 46.2% | +7.7 | โ 0.5% turns | |
| 29 | OpenClaw | Qwen3.5-397B | Software Engineering | EverOS | 25.0% | 46.2% | +21.2 | โ 32.0% turns | |
| 30 | Nanobot | Qwen3.5-27B | Reasoning & Problem Decomposition | GEPA | 47.0% | 44.0% | -3.0 | โ 0.2% chars | |
| 31 | OpenClaw | Qwen3.5-27B | Code Implementation | EverOS | 46.2% | 43.6% | -2.6 | โ 32.7% turns | |
| 32 | Nanobot | Qwen3.5-397B | Code Implementation | Memento | 51.3% | 43.6% | -7.7 | โ 17.4% turns | |
| 33 | OpenClaw | Qwen3.5-397B | Knowledge Work | Memento | 45.1% | 43.1% | -2.0 | โ 8.1% turns | |
| 34 | OpenClaw | Qwen3.5-397B | Knowledge Work | ReasoningBank | 45.1% | 43.1% | -2.0 | โ 5.9% turns | |
| 35 | Nanobot | Qwen3.5-27B | Knowledge Work | ReasoningBank | 43.1% | 43.1% | +0.0 | โ 64.4% turns | |
| 36 | OpenClaw | Qwen3.5-397B | Reasoning & Problem Decomposition | ReasoningBank | 48.0% | 43.0% | -5.0 | โ 13.8% chars | |
| 37 | Nanobot | Qwen3.5-27B | Reasoning & Problem Decomposition | EverOS | 47.0% | 43.0% | -4.0 | โ 2.8% chars | |
| 38 | OpenClaw | Qwen3.5-397B | Software Engineering | Memento | 25.0% | 42.3% | +17.3 | โ 28.9% turns | |
| 39 | Nanobot | Qwen3.5-397B | Software Engineering | EverOS | 46.2% | 42.3% | -3.9 | โ 15.9% turns | |
| 40 | Nanobot | Qwen3.5-397B | Software Engineering | GEPA | 46.2% | 42.3% | -3.9 | โ 8.3% turns | |
| 41 | OpenClaw | Qwen3.5-397B | Reasoning & Problem Decomposition | GEPA | 48.0% | 42.0% | -6.0 | โ 18.3% chars | |
| 42 | Nanobot | Qwen3.5-27B | Reasoning & Problem Decomposition | Memento | 47.0% | 42.0% | -5.0 | โ 1843.1% chars | |
| 43 | Nanobot | Qwen3.5-27B | Reasoning & Problem Decomposition | ReasoningBank | 47.0% | 42.0% | -5.0 | โ 42.4% chars | |
| 44 | OpenClaw | Qwen3.5-397B | Information Retrieval | ReasoningBank | 30.8% | 41.5% | +10.7 | โ 10.3% turns | |
| 45 | OpenClaw | Qwen3.5-27B | Knowledge Work | GEPA | 37.3% | 41.2% | +3.9 | โ 41.9% turns | |
| 46 | Nanobot | Qwen3.5-27B | Knowledge Work | Memento | 43.1% | 39.2% | -3.9 | โ 66.5% turns | |
| 47 | OpenClaw | Qwen3.5-27B | Reasoning & Problem Decomposition | EverOS | 44.0% | 39.0% | -5.0 | โ 31.8% chars | |
| 48 | OpenClaw | Qwen3.5-397B | Code Implementation | EverOS | 46.2% | 38.5% | -7.7 | โ 42.0% turns | |
| 49 | OpenClaw | Qwen3.5-397B | Code Implementation | ReasoningBank | 46.2% | 38.5% | -7.7 | โ 2.0% turns | |
| 50 | Nanobot | Qwen3.5-27B | Software Engineering | ReasoningBank | 38.5% | 38.5% | +0.0 | โ 6.5% turns | |
| 51 | OpenClaw | Qwen3.5-397B | Reasoning & Problem Decomposition | Memento | 48.0% | 38.0% | -10.0 | โ 68.5% chars | |
| 52 | Nanobot | Qwen3.5-27B | Code Implementation | ReasoningBank | 25.6% | 35.9% | +10.3 | โ 95.5% turns | |
| 53 | Nanobot | Qwen3.5-27B | Code Implementation | GEPA | 25.6% | 35.9% | +10.3 | โ 68.2% turns | |
| 54 | Nanobot | Qwen3.5-397B | Code Implementation | EverOS | 51.3% | 35.9% | -15.4 | โ 17.4% turns | |
| 55 | OpenClaw | Qwen3.5-27B | Knowledge Work | Memento | 37.3% | 35.3% | -2.0 | โ 100.0% turns | |
| 56 | Nanobot | Qwen3.5-27B | Knowledge Work | GEPA | 43.1% | 35.3% | -7.8 | โ 72.3% turns | |
| 57 | OpenClaw | Qwen3.5-27B | Knowledge Work | ReasoningBank | 37.3% | 34.5% | -2.8 | โ 29.1% turns | |
| 58 | OpenClaw | Qwen3.5-27B | Code Implementation | Memento | 46.2% | 33.3% | -12.9 | โ 6.1% turns | |
| 59 | OpenClaw | Qwen3.5-27B | Reasoning & Problem Decomposition | Memento | 44.0% | 32.0% | -12.0 | โ 21.1% chars | |
| 60 | OpenClaw | Qwen3.5-27B | Reasoning & Problem Decomposition | GEPA | 44.0% | 32.0% | -12.0 | โ 37.9% chars | |
| 61 | OpenClaw | Qwen3.5-27B | Software Engineering | Memento | 38.5% | 30.8% | -7.7 | โ 41.7% turns | |
| 62 | Nanobot | Qwen3.5-27B | Code Implementation | Memento | 25.6% | 30.8% | +5.2 | โ 27.3% turns | |
| 63 | OpenClaw | Qwen3.5-397B | Information Retrieval | Memento | 30.8% | 29.2% | -1.6 | โ 8.6% turns | |
| 64 | OpenClaw | Qwen3.5-27B | Code Implementation | ReasoningBank | 46.2% | 28.2% | -18.0 | โ 8.2% turns | |
| 65 | OpenClaw | Qwen3.5-397B | Information Retrieval | GEPA | 30.8% | 26.2% | -4.6 | โ 14.1% turns | |
| 66 | Nanobot | Qwen3.5-397B | Information Retrieval | Memento | 10.8% | 26.2% | +15.4 | โ 33.6% turns | |
| 67 | OpenClaw | Qwen3.5-27B | Information Retrieval | GEPA | 10.8% | 24.6% | +13.8 | โ 0.6% turns | |
| 68 | OpenClaw | Qwen3.5-27B | Reasoning & Problem Decomposition | ReasoningBank | 44.0% | 21.0% | -23.0 | โ 28.7% chars | |
| 69 | Nanobot | Qwen3.5-397B | Information Retrieval | EverOS | 10.8% | 20.0% | +9.2 | โ 44.0% turns | |
| 70 | Nanobot | Qwen3.5-27B | Code Implementation | EverOS | 25.6% | 17.9% | -7.7 | โ 131.8% turns | |
| 71 | OpenClaw | Qwen3.5-27B | Information Retrieval | Memento | 10.8% | 16.9% | +6.1 | โ 19.2% turns | |
| 72 | OpenClaw | Qwen3.5-27B | Information Retrieval | EverOS | 10.8% | 15.4% | +4.6 | โ 76.5% turns | |
| 73 | OpenClaw | Qwen3.5-27B | Software Engineering | GEPA | 38.5% | 15.4% | -23.1 | โ 39.9% turns | |
| 74 | Nanobot | Qwen3.5-27B | Information Retrieval | EverOS | 6.2% | 13.8% | +7.6 | โ 34.2% turns | |
| 75 | Nanobot | Qwen3.5-397B | Information Retrieval | ReasoningBank | 10.8% | 13.8% | +3.0 | โ 3.6% turns | |
| 76 | OpenClaw | Qwen3.5-27B | Information Retrieval | ReasoningBank | 10.8% | 12.3% | +1.5 | โ 17.8% turns | |
| 77 | Nanobot | Qwen3.5-397B | Information Retrieval | GEPA | 10.8% | 12.3% | +1.5 | โ 47.6% turns | |
| 78 | Nanobot | Qwen3.5-27B | Information Retrieval | ReasoningBank | 6.2% | 9.2% | +3.0 | โ 40.6% turns | |
| 79 | Nanobot | Qwen3.5-27B | Information Retrieval | GEPA | 6.2% | 4.6% | -1.6 | โ 47.4% turns | |
| 80 | Nanobot | Qwen3.5-27B | Information Retrieval | Memento | 6.2% | 3.1% | -3.1 | โ 18.9% turns |