{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"promptbench","slug":"promptbench","name":"PromptBench","type":"benchmark","url":"https://github.com/microsoft/promptbench","page_url":"https://unfragile.ai/promptbench","categories":["testing-quality"],"tags":[],"pricing":{"model":"free","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"promptbench__cap_0","uri":"capability://tool.use.integration.unified.multi.model.llm.interface.with.factory.pattern.abstraction","name":"unified multi-model llm interface with factory pattern abstraction","description":"Provides a factory-pattern-based Model System that abstracts heterogeneous LLM APIs (OpenAI, Anthropic, local models, etc.) behind a single LLMModel interface, enabling consistent model instantiation and inference regardless of underlying provider. Uses a registry-based approach where model names map to concrete implementations, eliminating boilerplate for API-specific authentication and request formatting.","intents":["I want to benchmark the same prompt across 10 different LLM providers without rewriting integration code for each","I need to swap between GPT-4, Claude, and local Llama models in my evaluation pipeline with a single parameter change","I'm building a framework that should support adding new model providers without modifying core evaluation logic"],"best_for":["LLM researchers comparing model behavior across providers","teams building multi-model evaluation frameworks","developers prototyping model-agnostic applications"],"limitations":["Factory pattern adds abstraction layer that may obscure provider-specific capabilities or rate-limiting behavior","Unified interface cannot expose all provider-specific parameters without breaking abstraction","Requires explicit API keys or credentials for each provider in environment or config"],"requires":["Python 3.8+","PyTorch (for framework integration)","API keys for target providers (OpenAI, Anthropic, etc.) or local model weights"],"input_types":["model name (string identifier)","prompt text","optional model parameters (temperature, max_tokens, etc.)"],"output_types":["model response text","structured metadata (tokens used, latency, provider info)"],"categories":["tool-use-integration","model-abstraction"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_1","uri":"capability://image.visual.vision.language.model.evaluation.with.unified.vlm.interface","name":"vision-language model evaluation with unified vlm interface","description":"Extends the Model System to support Vision-Language Models (VLMs) through a dedicated VLMModel factory class that handles image input preprocessing, multimodal tokenization, and provider-specific vision APIs (CLIP, GPT-4V, LLaVA, etc.). Abstracts away image encoding, resolution handling, and vision-specific parameters behind the same unified interface as text-only models.","intents":["I need to evaluate how different VLMs (GPT-4V, Claude Vision, open-source LLaVA) perform on the same image-based reasoning tasks","I want to benchmark VLM robustness when images are adversarially perturbed or compressed","I'm testing whether VLMs maintain consistent behavior across different image formats and resolutions"],"best_for":["multimodal AI researchers evaluating vision-language alignment","teams building image understanding benchmarks","researchers studying adversarial robustness in vision-language models"],"limitations":["Image preprocessing (resizing, encoding) may introduce artifacts that affect robustness evaluation","VLM APIs have different image size limits and encoding requirements that the abstraction must normalize","Vision-specific parameters (image quality, aspect ratio handling) are not fully exposed through unified interface"],"requires":["Python 3.8+","PyTorch with vision libraries (torchvision, PIL)","API keys for vision-capable providers or local VLM weights","Image files in supported formats (PNG, JPEG, WebP)"],"input_types":["image file paths or PIL Image objects","text prompts describing image content or tasks","optional vision-specific parameters (image quality, resolution)"],"output_types":["VLM response text","structured analysis of image understanding","metadata about image processing (resolution used, encoding time)"],"categories":["image-visual","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_10","uri":"capability://data.processing.analysis.visualization.and.analysis.tools.for.evaluation.results","name":"visualization and analysis tools for evaluation results","description":"Provides visualization utilities that generate charts, heatmaps, and interactive plots showing model performance across datasets, techniques, and perturbation levels. Includes analysis tools for understanding robustness degradation patterns, identifying failure modes, and comparing prompt engineering technique effectiveness. Visualizations support both static (matplotlib) and interactive (plotly) output formats.","intents":["I want to visualize how my model's performance degrades under different adversarial attacks","I need to see which prompt engineering techniques are most effective across different datasets","I'm analyzing failure modes and want to visualize error patterns by task type and complexity"],"best_for":["researchers analyzing model behavior and robustness patterns","teams presenting evaluation results to stakeholders","developers debugging model failures and understanding error distributions"],"limitations":["Visualization quality depends on data dimensionality — high-dimensional results may be hard to visualize","Static visualizations may not capture complex relationships — interactive plots required for exploration","Visualization choices (axes, scales, colors) can emphasize or obscure patterns"],"requires":["Python 3.8+","matplotlib or plotly (for visualization)","evaluation results in structured format"],"input_types":["evaluation results (metrics, models, datasets, techniques)","optional visualization parameters (chart type, axes, filters)"],"output_types":["static visualizations (PNG, PDF)","interactive plots (HTML)","summary statistics and analysis"],"categories":["data-processing-analysis","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_11","uri":"capability://tool.use.integration.extensible.framework.architecture.for.custom.evaluations","name":"extensible framework architecture for custom evaluations","description":"Provides extension points and base classes that enable users to add custom models, datasets, attack methods, and evaluation metrics without modifying core framework code. Uses inheritance-based extension pattern where custom implementations extend base classes (LLMModel, Dataset, AttackMethod, Metric) and register themselves with the framework. Includes documentation and examples for implementing custom components.","intents":["I want to add support for my custom local model to PromptBench without forking the repository","I need to implement a domain-specific adversarial attack method that isn't in the standard library","I'm building a custom evaluation metric for my specific use case and want to integrate it with PromptBench"],"best_for":["researchers implementing novel evaluation methods","teams integrating PromptBench with proprietary models or datasets","developers building domain-specific evaluation frameworks on top of PromptBench"],"limitations":["Extension API stability is not guaranteed — framework updates may break custom implementations","Custom implementations must follow framework conventions and patterns","Limited documentation for advanced extension scenarios","Debugging custom extensions requires understanding framework internals"],"requires":["Python 3.8+","understanding of PromptBench architecture and base classes","familiarity with inheritance and factory patterns"],"input_types":["custom component implementation (Python class extending base class)","component registration metadata"],"output_types":["integrated custom component available in framework","custom component results in evaluation pipeline"],"categories":["tool-use-integration","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_2","uri":"capability://safety.moderation.multi.level.adversarial.prompt.attack.generation","name":"multi-level adversarial prompt attack generation","description":"Implements a hierarchical attack system that generates adversarial prompts at four granularity levels (character, word, sentence, semantic) using attack methods like DeepWordBug, TextFooler, BertAttack, CheckList, and StressTest. Each attack level uses different perturbation strategies: character-level attacks modify individual characters or introduce typos, word-level attacks substitute semantically similar words, sentence-level attacks restructure syntax, and semantic-level attacks use human-crafted adversarial examples. The system maintains semantic equivalence while degrading model performance to measure robustness.","intents":["I want to systematically test whether my LLM's performance degrades when prompts contain typos, misspellings, or character-level noise","I need to evaluate if my model is robust to word substitutions and paraphrasing that preserve semantic meaning","I'm measuring how much my model's accuracy drops when prompts are syntactically restructured or use adversarial sentence constructions"],"best_for":["LLM safety researchers evaluating adversarial robustness","teams building production systems that need to handle noisy or adversarial user inputs","researchers studying prompt injection vulnerabilities"],"limitations":["Character-level attacks may produce non-English text that violates model training assumptions","Word-level attacks using BERT embeddings require downloading large pretrained models (~500MB)","Semantic-level attacks rely on human-crafted examples that may not cover all adversarial patterns","Attack success depends on model's training data — attacks effective on one model may not transfer to another"],"requires":["Python 3.8+","transformers library (for BERT-based word attacks)","nltk or spacy (for sentence-level parsing)","original prompts and expected outputs for evaluation"],"input_types":["original prompt text","attack level specification (character/word/sentence/semantic)","attack method name (DeepWordBug, TextFooler, etc.)","optional perturbation intensity parameter"],"output_types":["adversarially perturbed prompt text","attack metadata (perturbation type, positions modified, semantic similarity score)","model response to adversarial prompt","robustness metrics (accuracy drop, semantic preservation)"],"categories":["safety-moderation","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_3","uri":"capability://data.processing.analysis.dynamic.validation.with.on.the.fly.evaluation.sample.generation","name":"dynamic validation with on-the-fly evaluation sample generation","description":"Implements DyVal, a dynamic evaluation framework that generates evaluation samples on-the-fly with controlled complexity levels to mitigate test data contamination. Rather than using static benchmark datasets, DyVal generates samples for four reasoning types (Arithmetic, Boolean Logic, Deduction Logic, Reachability) with parameterized difficulty, ensuring models cannot memorize evaluation data. The system controls complexity through parameters like number of operations, variable counts, or graph sizes, enabling systematic evaluation of reasoning capabilities across difficulty ranges.","intents":["I want to evaluate my LLM on reasoning tasks without worrying that it has memorized benchmark datasets during pretraining","I need to test whether my model's reasoning performance degrades gracefully as problem complexity increases","I'm measuring if my model can generalize to arithmetic or logic problems of varying difficulty that weren't in its training data"],"best_for":["researchers studying LLM reasoning capabilities and generalization","teams evaluating models on tasks where data contamination is a concern","developers building reasoning-intensive applications who need uncontaminated evaluation"],"limitations":["Generated samples may not capture all edge cases or failure modes present in real-world reasoning tasks","Complexity parameterization is task-specific — difficulty scaling differs between arithmetic and graph reachability","Generated samples lack the linguistic diversity and natural phrasing of human-written benchmarks","Evaluation metrics for generated samples may not align with metrics used in published benchmarks"],"requires":["Python 3.8+","numpy (for random sample generation)","specification of reasoning task type and complexity parameters"],"input_types":["task type (Arithmetic, Boolean Logic, Deduction, Reachability)","complexity parameters (operation count, variable count, graph size, etc.)","number of samples to generate","optional random seed for reproducibility"],"output_types":["generated evaluation samples (problem text)","ground truth answers","complexity metadata (difficulty level, operation count)","evaluation results (accuracy, reasoning correctness)"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_4","uri":"capability://data.processing.analysis.efficient.multi.prompt.evaluation.with.performance.prediction","name":"efficient multi-prompt evaluation with performance prediction","description":"Implements PromptEval, an efficient evaluation method that predicts performance on large datasets using performance data from a small sample, reducing computational cost of evaluating multiple prompt variations. The system uses statistical inference from a small sample (e.g., 100 examples) to estimate performance on the full dataset (e.g., 10,000 examples), enabling rapid iteration over prompt engineering techniques without evaluating every prompt on every example. Maintains statistical validity through confidence intervals and sample size recommendations.","intents":["I want to quickly compare 50 different prompt variations without running full evaluation on all 10,000 test examples for each","I need to estimate which prompt engineering technique (Chain-of-Thought, Few-shot, etc.) will perform best before committing to full evaluation","I'm optimizing prompts for a large dataset but want to reduce evaluation latency from hours to minutes"],"best_for":["prompt engineers iterating rapidly on prompt variations","teams with large evaluation datasets who need faster feedback loops","researchers studying prompt engineering techniques at scale"],"limitations":["Performance predictions have statistical error — small sample may not represent full dataset distribution","Prediction accuracy depends on sample representativeness; biased samples lead to inaccurate estimates","Confidence intervals widen with smaller sample sizes, reducing prediction reliability","Method assumes performance is relatively stable across dataset — may fail on highly skewed or multi-modal distributions"],"requires":["Python 3.8+","numpy or scipy (for statistical inference)","at least 50-100 labeled examples for reliable prediction","full dataset or representative sample for validation"],"input_types":["small sample of evaluation examples (50-500 examples)","model responses on small sample","ground truth labels for small sample","optional full dataset for validation"],"output_types":["predicted performance metrics (accuracy, F1, etc.)","confidence intervals around predictions","sample size recommendations for target confidence level","comparison of predicted vs actual performance on full dataset"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_5","uri":"capability://text.generation.language.chain.of.thought.and.advanced.prompt.engineering.technique.library","name":"chain-of-thought and advanced prompt engineering technique library","description":"Implements a library of prompt engineering methods including Chain-of-Thought (CoT), Emotion Prompt, Expert Prompting, and other advanced techniques that modify prompts to improve model reasoning and performance. Each technique is implemented as a prompt transformation that injects reasoning patterns, emotional context, or role-based framing into the original prompt. The system allows composition of multiple techniques and systematic evaluation of their individual and combined effects on model performance.","intents":["I want to test whether adding chain-of-thought reasoning steps improves my model's accuracy on complex reasoning tasks","I need to evaluate whether emotion prompts or expert role-playing improve model performance on specific domains","I'm comparing the effectiveness of different prompt engineering techniques to find the best approach for my use case"],"best_for":["prompt engineers optimizing model performance through prompt design","researchers studying how prompt structure affects model reasoning","teams building production systems that need reliable model outputs"],"limitations":["Technique effectiveness varies significantly across models — CoT helps GPT-3.5 but may not help smaller models","Some techniques (Emotion Prompt) may not transfer across domains or model architectures","Composing multiple techniques can lead to prompt bloat and increased token usage","Techniques are heuristic-based without theoretical guarantees of improvement"],"requires":["Python 3.8+","original prompts and evaluation dataset","target model with inference capability"],"input_types":["original prompt text","technique name (CoT, Emotion, Expert, etc.)","optional technique parameters (reasoning steps, role description)","evaluation examples and labels"],"output_types":["transformed prompt with technique applied","model responses to transformed prompt","performance metrics comparing original vs transformed","technique effectiveness analysis"],"categories":["text-generation-language","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_6","uri":"capability://planning.reasoning.meta.probing.agents.for.model.capability.discovery","name":"meta-probing agents for model capability discovery","description":"Implements Meta Probing Agents (MPA), an automated system that discovers and characterizes model capabilities through systematic probing. The MPA framework uses agents to generate targeted probes (test cases) that explore model behavior boundaries, identify capability gaps, and characterize performance patterns across different input types and complexity levels. Agents iteratively refine probes based on model responses to discover what the model can and cannot do.","intents":["I want to automatically discover what reasoning capabilities my LLM has and where it fails","I need to characterize my model's performance boundaries across different task types without manually writing test cases","I'm building a capability map of my model to understand what applications it's suitable for"],"best_for":["model developers understanding newly trained model capabilities","researchers studying emergent capabilities in large models","teams assessing model suitability for specific applications"],"limitations":["MPA discovery is heuristic-based and may miss capabilities not covered by generated probes","Probe generation quality depends on agent design — poorly designed agents may generate uninformative tests","Capability characterization is relative to probe distribution — different probes may reveal different capabilities","Computational cost scales with number of probes and model inference latency"],"requires":["Python 3.8+","target model with inference capability","agent framework (LLM-based or rule-based) for probe generation"],"input_types":["target model specification","task domain or capability area to probe","optional seed probes or examples","probe generation parameters"],"output_types":["generated probes (test cases)","model responses to probes","capability characterization (what model can/cannot do)","performance boundaries and failure modes","capability map visualization"],"categories":["planning-reasoning","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_7","uri":"capability://data.processing.analysis.dataset.loader.with.multi.source.integration.and.preprocessing","name":"dataset loader with multi-source integration and preprocessing","description":"Implements a DatasetLoader class that provides unified access to diverse evaluation datasets (GLUE, MMLU, BIG-Bench Hard, etc.) with automatic downloading, caching, and preprocessing. The loader abstracts away dataset-specific formats, splits, and preprocessing requirements, enabling consistent dataset handling across different benchmarks. Supports both language datasets and vision-language datasets with automatic format normalization.","intents":["I want to load GLUE, MMLU, and BIG-Bench datasets without writing custom download and parsing code for each","I need to quickly switch between different datasets for evaluation without changing my evaluation pipeline","I'm building a benchmark that should work with multiple datasets and need consistent data loading"],"best_for":["researchers benchmarking models across multiple datasets","teams building evaluation frameworks that support multiple benchmarks","developers who want to avoid dataset-specific preprocessing code"],"limitations":["Dataset loader caching may consume significant disk space for large datasets (MMLU, BIG-Bench)","Some datasets have licensing restrictions that require manual download or authentication","Dataset preprocessing is standardized but may not match original benchmark's exact preprocessing","Updates to datasets may not be reflected in cached versions without manual cache clearing"],"requires":["Python 3.8+","disk space for dataset caching (varies by dataset, 1GB-50GB+)","internet connection for initial dataset download","optional: authentication credentials for restricted datasets"],"input_types":["dataset name (GLUE, MMLU, BIG-Bench, etc.)","optional dataset split (train/val/test)","optional preprocessing parameters"],"output_types":["loaded dataset as structured format (list of dicts, pandas DataFrame, etc.)","dataset metadata (size, splits, task type)","preprocessed examples ready for model evaluation"],"categories":["data-processing-analysis","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_8","uri":"capability://data.processing.analysis.evaluation.metrics.computation.with.task.specific.scoring","name":"evaluation metrics computation with task-specific scoring","description":"Implements a comprehensive metrics system (eval.py) that computes task-specific evaluation metrics including accuracy, F1, BLEU, ROUGE, and custom metrics for different task types (classification, generation, reasoning). The system automatically selects appropriate metrics based on task type and dataset, handles edge cases (empty predictions, mismatched lengths), and provides detailed metric breakdowns by example and category. Supports both exact-match and fuzzy matching for generated text.","intents":["I want to evaluate my model's performance using standard metrics (accuracy, F1, BLEU) without implementing metric computation myself","I need task-specific metrics for different evaluation datasets (classification metrics for GLUE, generation metrics for summarization)","I'm analyzing model performance and need detailed metric breakdowns by example and error category"],"best_for":["researchers evaluating models across diverse task types","teams building evaluation pipelines that need standard metrics","developers who want reliable, well-tested metric implementations"],"limitations":["Metric selection is automatic but may not match researcher's preferred metric variant","Some metrics (BLEU, ROUGE) have known limitations for evaluating semantic similarity","Custom metrics require manual implementation — framework provides standard metrics only","Metric computation assumes well-formed predictions; malformed outputs may cause errors"],"requires":["Python 3.8+","numpy (for metric computation)","optional: nltk or rouge library (for BLEU/ROUGE metrics)"],"input_types":["predictions (model outputs)","ground truth labels","task type specification (classification, generation, etc.)","optional metric parameters"],"output_types":["computed metrics (accuracy, F1, BLEU, ROUGE, etc.)","per-example metric scores","metric breakdowns by category or class","detailed error analysis"],"categories":["data-processing-analysis","safety-moderation"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__cap_9","uri":"capability://data.processing.analysis.benchmark.leaderboard.and.results.aggregation","name":"benchmark leaderboard and results aggregation","description":"Provides a leaderboard system that aggregates evaluation results across multiple models, datasets, and prompt engineering techniques, enabling comparative analysis and ranking. The leaderboard tracks model performance over time, supports filtering by dataset/technique/model, and generates visualizations of performance trends. Results are stored in a structured format that enables querying and statistical comparison across runs.","intents":["I want to see how different models rank on a specific benchmark and compare their performance","I need to track how my model's performance changes as I apply different prompt engineering techniques","I'm publishing a benchmark and need a leaderboard to show community results and enable comparison"],"best_for":["benchmark creators publishing evaluation results","researchers comparing model performance across multiple dimensions","teams tracking model improvement over time"],"limitations":["Leaderboard results depend on evaluation setup (model versions, hyperparameters) which may differ across submissions","Ranking may not be statistically significant if performance differences are small","Leaderboard does not account for computational cost or inference latency differences","Results are only comparable if using identical datasets and evaluation protocols"],"requires":["Python 3.8+","structured results storage (JSON, database, etc.)","evaluation results from multiple models and datasets"],"input_types":["evaluation results (model name, dataset, metrics, technique)","optional metadata (model version, date, hyperparameters)"],"output_types":["ranked leaderboard (models sorted by performance)","performance comparison tables","trend visualizations (performance over time)","filtered views (by dataset, technique, model family)"],"categories":["data-processing-analysis","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"promptbench__headline","uri":"capability://testing.quality.benchmarking.framework.for.evaluating.large.language.models","name":"benchmarking framework for evaluating large language models","description":"PromptBench is a comprehensive framework designed to benchmark and evaluate the performance and robustness of large language models through various adversarial prompts and datasets, making it essential for researchers in AI.","intents":["best benchmarking framework for LLMs","evaluation tools for language models","how to assess model robustness","top tools for prompt evaluation","comprehensive LLM evaluation framework"],"best_for":["AI researchers","developers testing LLMs"],"limitations":["requires knowledge of model architectures"],"requires":["Python","PyTorch"],"input_types":["text prompts","datasets"],"output_types":["evaluation metrics","robustness analysis"],"categories":["testing-quality"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":63,"verified":false,"data_access_risk":"high","permissions":["Python 3.8+","PyTorch (for framework integration)","API keys for target providers (OpenAI, Anthropic, etc.) or local model weights","PyTorch with vision libraries (torchvision, PIL)","API keys for vision-capable providers or local VLM weights","Image files in supported formats (PNG, JPEG, WebP)","matplotlib or plotly (for visualization)","evaluation results in structured format","understanding of PromptBench architecture and base classes","familiarity with inheritance and factory patterns"],"failure_modes":["Factory pattern adds abstraction layer that may obscure provider-specific capabilities or rate-limiting behavior","Unified interface cannot expose all provider-specific parameters without breaking abstraction","Requires explicit API keys or credentials for each provider in environment or config","Image preprocessing (resizing, encoding) may introduce artifacts that affect robustness evaluation","VLM APIs have different image size limits and encoding requirements that the abstraction must normalize","Vision-specific parameters (image quality, aspect ratio handling) are not fully exposed through unified interface","Visualization quality depends on data dimensionality — high-dimensional results may be hard to visualize","Static visualizations may not capture complex relationships — interactive plots required for exploration","Visualization choices (axes, scales, colors) can emphasize or obscure patterns","Extension API stability is not guaranteed — framework updates may break custom implementations","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.7,"quality":0.9,"ecosystem":0.39999999999999997,"match_graph":0.25,"freshness":0.52,"weights":{"adoption":0.25,"quality":0.35,"ecosystem":0.15,"match_graph":0.2,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-06-17T09:51:05.295Z","last_scraped_at":null,"last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=promptbench","compare_url":"https://unfragile.ai/compare?artifact=promptbench"}},"signature":"8BalyVUebtSThjtfWsXZ1wSarbAHLYghYYVkXtHdNqWqOrYGJTPeIMUNfcwkcLJcNh3VTjKykFGscaH22bvVCQ==","signedAt":"2026-06-21T07:47:45.447Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/promptbench","artifact":"https://unfragile.ai/promptbench","verify":"https://unfragile.ai/api/v1/verify?slug=promptbench","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}