Capability
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →via “metric composition and custom criteria evaluation”
RAG evaluation framework — faithfulness, relevancy, context precision/recall metrics.
Unique: Metric system uses inheritance hierarchy (Metric → SingleTurnMetric → specific implementations) with PromptMixin for dynamic prompt management and Instructor adapter for structured output. Supports metric training/alignment workflows to calibrate custom metrics against human judgments.
vs others: More flexible than fixed metric suites because metrics are composable Python objects with pluggable LLM backends, enabling domain-specific evaluation without forking the framework.
ML/LLM monitoring — data drift, model quality, 100+ metrics, dashboards, test suites.
Unique: Provides a minimal base class interface (Metric, TestCondition) that integrates directly into the PythonEngine execution model, enabling custom metrics to compose seamlessly with built-in metrics without adapter code. The architecture separates metric definition from execution, allowing custom metrics to benefit from framework features (batching, caching, result serialization) automatically.
vs others: More extensible than closed-source monitoring tools because the plugin system is code-first and version-controlled; more integrated than standalone metric libraries because custom metrics inherit framework features (dashboard integration, test suite composition) without duplication.
Building an AI tool with “Custom Metric And Test Composition With Python Plugin Architecture”?
Submit your artifact →curl unfragile.ai/agents.md | sh© 2026 Unfragile. The platform for software for agents.