ABC-Bench evaluates agentic coding on 224 tasks across real OSS backends using containerized dependencies and external end-to-end API tests
ABC-Bench evaluates agentic coding on 224 tasks across real OSS backends using containerized dependencies and external end-to-end API tests