EnterpriseOps-Gym

Tue, 01 Apr 2025 00:00:00 +0000

EnterpriseOps-Gym features 1,150 expert designed tasks across 8 interconnected enterprise domains, with persistent state, strict verification logic, and policy aware execution requirements. It tests whether AI agents can handle domain expertise, not just general reasoning.

WorkArena and BrowserGym

Mon, 01 Jul 2024 00:00:00 +0000

WorkArena is a benchmark of tasks based on the ServiceNow platform that measures how well web agents can perform common knowledge work. BrowserGym provides a rich environment for designing and evaluating such agents with multimodal observations and a comprehensive action set. Published at ICML 2024.

Benchmarks | David Vázquez

EnterpriseOps-Gym

WorkArena and BrowserGym