EvoAgentBench

Agent Performance

Pass rates on EvoAgentBench ยท 5 domains ยท multiple self-evolution methods

Partial results. More agents, domains, and methods coming soon.

72.4%

Best With Skills

+10.9%

Avg. Improvement

20

Configurations

Filter ยท Agent
Filter ยท Domain
Filter ยท Self-Evolving Methods
Sort by
#AgentBase ModelDomainSelf-Evolving MethodsWithoutWith Skillsฮ”Cost
1OpenClawQwen3.5-397BKnowledge WorkHuman Design50.0%72.4%+22.4
โ†‘ 91.8% turns
2OpenClawQwen3.5-27BKnowledge WorkHuman Design51.7%65.5%+13.8
โ†‘ 43.8% turns
3OpenClawQwen3.5-397BCode ImplementationHuman Design56.4%64.1%+7.7
โ†“ 6.7% turns
4OpenClawQwen3.5-27BCode ImplementationHuman Design53.8%64.1%+10.3
โ€” 0% turns
5OpenClawQwen3.5-397BSoftware EngineeringHuman Design26.9%61.5%+34.6
โ†“ 0.5% turns
6OpenClawQwen3.5-397BReasoning & Problem DecompositionHuman Design45.0%60.0%+15.0
โ†‘ 2.7% chars
7OpenClawQwen3.5-397BKnowledge WorkEverOS50.0%56.9%+6.9
โ†‘ 24.7% turns
8OpenClawQwen3.5-397BCode ImplementationEverOS56.4%56.4%+0.0
โ†“ 3.3% turns
9OpenClawQwen3.5-397BInformation RetrievalHuman Design32.3%55.4%+23.1
โ†“ 21.8% turns
10OpenClawQwen3.5-27BKnowledge WorkEverOS51.7%55.2%+3.5
โ†‘ 5.4% turns
11OpenClawQwen3.5-27BCode ImplementationEverOS53.8%51.3%-2.5
โ†“ 4.8% turns
12OpenClawQwen3.5-397BReasoning & Problem DecompositionEverOS45.0%49.0%+4.0
โ†“ 32.1% chars
13OpenClawQwen3.5-397BInformation RetrievalEverOS32.3%43.1%+10.8
โ†“ 33.1% turns
14OpenClawQwen3.5-27BReasoning & Problem DecompositionEverOS37.0%42.0%+5.0
โ†“ 6.2% chars
15OpenClawQwen3.5-397BSoftware EngineeringEverOS26.9%38.5%+11.6
โ†“ 11.4% turns
16OpenClawQwen3.5-27BSoftware EngineeringEverOS11.5%38.5%+27.0
โ†‘ 41.2% turns
17OpenClawQwen3.5-27BSoftware EngineeringHuman Design11.5%38.5%+27.0
โ†‘ 62.5% turns
18OpenClawQwen3.5-27BInformation RetrievalHuman Design30.8%35.4%+4.6
โ†‘ 14.2% turns
19OpenClawQwen3.5-27BInformation RetrievalEverOS30.8%32.3%+1.5
โ†‘ 4.7% turns
20OpenClawQwen3.5-27BReasoning & Problem DecompositionHuman Design37.0%29.0%-8.0
โ†“ 13.0% chars