via “multi-provider vlm integration with native and composed model support”
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
Unique: Implements a provider abstraction layer with explicit support for three model categories: native computer-use models (Claude with native tool use), composed models (standard VLMs with grounding adapters that add action generation capability), and local model adapters (Ollama, vLLM). Unified message format (Responses API) normalizes outputs across all categories, enabling seamless model swapping.
vs others: Broader model coverage than single-provider solutions; explicit local model support enables on-premise deployment vs. cloud-only alternatives, while composed model support allows use of any VLM (not just native computer-use models) with adapter-based action generation.