{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"llama-guard","slug":"llama-guard","name":"Llama Guard","type":"model","url":"https://github.com/meta-llama/PurpleLlama","page_url":"https://unfragile.ai/llama-guard","categories":["code-review-security"],"tags":[],"pricing":{"model":"free","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"llama-guard__cap_0","uri":"capability://safety.moderation.multi.category.content.classification.with.customizable.safety.policies","name":"multi-category content classification with customizable safety policies","description":"Llama Guard uses a fine-tuned Llama backbone to classify user prompts and model responses against a taxonomy of unsafe content categories (violence, sexual content, criminal planning, self-harm, etc.). The model operates as a sequence classifier that tokenizes input text and produces category-level safety judgments, allowing deployment teams to define custom policy thresholds per category rather than enforcing a single binary safe/unsafe boundary. This enables nuanced safety enforcement where some categories may be blocked entirely while others permit higher risk tolerance.","intents":["I need to filter user inputs before they reach my LLM to prevent jailbreak attempts and harmful requests","I want to scan model outputs before returning them to users to catch unsafe generations","I need to apply different safety policies for different use cases — stricter for children's content, looser for security research"],"best_for":["Teams deploying open-source LLMs who need guardrails without relying on proprietary APIs","Organizations with custom safety requirements that don't fit OpenAI/Anthropic's policies","Developers building multi-tenant systems where different customers need different safety thresholds"],"limitations":["Classification latency adds ~50-200ms per inference depending on model size and hardware","Requires GPU or sufficient CPU resources for real-time inference; CPU-only deployment is slow","Training data reflects Meta's safety taxonomy; may not align perfectly with domain-specific harms (e.g., financial fraud, medical misinformation)","No built-in support for context-aware safety — treats each prompt/response independently without conversation history"],"requires":["Llama Guard model weights (7B or 13B variant)","Python 3.8+","PyTorch 1.13+","Transformers library 4.30+","GPU with 8GB+ VRAM for 7B model, 16GB+ for 13B (CPU inference possible but slow)"],"input_types":["text (user prompts, model responses, arbitrary strings)"],"output_types":["structured JSON with per-category risk scores and overall safety judgment","category labels with confidence scores"],"categories":["safety-moderation","content-filtering"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_1","uri":"capability://safety.moderation.prompt.injection.vulnerability.detection","name":"prompt injection vulnerability detection","description":"Llama Guard identifies attempts to manipulate LLM behavior through prompt injection attacks by classifying prompts that contain adversarial instructions designed to override system prompts or elicit unsafe behavior. The model learns patterns of injection techniques (e.g., 'ignore previous instructions', role-play scenarios, hypothetical framing) from training data that includes both benign and adversarial prompt variants. This capability integrates with the broader CyberSecEval benchmark framework which includes prompt injection test datasets.","intents":["I need to detect when users are trying to jailbreak my LLM with prompt injection attacks before the request reaches the model","I want to identify and log suspicious prompts that attempt to override my system instructions","I need to measure how vulnerable my LLM deployment is to prompt injection attacks"],"best_for":["Teams deploying LLMs in adversarial environments (customer-facing chatbots, public APIs)","Security researchers evaluating LLM robustness","Organizations required to audit and log attempted attacks"],"limitations":["Adversarial attacks evolve faster than model training cycles; zero-day injection techniques may bypass detection","No defense against visual prompt injection (images containing text instructions) — requires separate CodeShield model","Cannot distinguish between legitimate complex instructions and malicious injections in all cases","Performance degrades on prompts in languages not well-represented in training data"],"requires":["Llama Guard model weights","Python 3.8+","Prompt injection test datasets (available in CyberSecEval benchmark)"],"input_types":["text (user prompts, potentially containing injection attempts)"],"output_types":["binary or multi-class classification (injection detected / not detected)","confidence scores per injection technique"],"categories":["safety-moderation","code-generation-editing"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_10","uri":"capability://safety.moderation.visual.prompt.injection.attack.detection.and.evaluation","name":"visual prompt injection attack detection and evaluation","description":"CyberSecEval v3 extends safety evaluation to visual prompt injection attacks where adversaries embed malicious instructions in images to manipulate multimodal LLMs. PurpleLlama provides benchmarks and evaluation methodology for assessing LLM robustness to visual injection attacks, enabling safety assessment of vision-capable models before deployment.","intents":["I want to test whether my multimodal LLM is vulnerable to visual prompt injection attacks","I need to evaluate safety of vision-capable models across image-based attack vectors","I want to understand the intersection of visual content safety and prompt injection vulnerabilities"],"best_for":["Teams deploying multimodal LLMs (vision + language) in production","Organizations evaluating emerging attack vectors on vision-capable models","Researchers studying adversarial robustness of multimodal systems"],"limitations":["Visual injection attack evaluation is nascent; benchmark coverage is limited compared to text-based attacks","Requires image generation or curation for attack dataset; more resource-intensive than text-only benchmarks","Effectiveness varies significantly across vision models and architectures; results may not generalize","No standardized metrics for visual injection severity; evaluation methodology is still evolving"],"requires":["Python 3.8+","Multimodal LLM supporting vision inputs","Visual prompt injection benchmark dataset (included in CyberSecEval v3+)","Image processing libraries (PIL, OpenCV)"],"input_types":["images with embedded malicious text or instructions","multimodal prompts combining text and images"],"output_types":["injection success rate (percentage of attacks that manipulated model behavior)","per-attack-type success metrics","vulnerability classification"],"categories":["safety-moderation","image-visual"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_11","uri":"capability://safety.moderation.autonomous.offensive.cyber.operations.capability.evaluation","name":"autonomous offensive cyber operations capability evaluation","description":"CyberSecEval v3 includes benchmarks for evaluating LLM capability to function as autonomous cyber attack agents, testing whether models can plan and execute multi-step offensive operations (reconnaissance, exploitation, lateral movement). This evaluation measures the risk of LLM misuse for cybercriminal purposes and informs safety policies around autonomous agent capabilities.","intents":["I want to measure whether my LLM could be misused as an autonomous cyber attack tool","I need to understand the offensive cyber capabilities of models I'm deploying","I want to evaluate whether my safety policies adequately constrain autonomous offensive behavior"],"best_for":["Security teams assessing LLM misuse risks in adversarial threat models","Organizations with national security or critical infrastructure concerns","Researchers studying LLM capabilities for offensive cybersecurity"],"limitations":["Evaluation is inherently sensitive; benchmark details may be restricted or redacted for security reasons","Measuring autonomous offensive capability is subjective; requires expert judgment to assess attack feasibility","Results are highly model-specific; transferability across models is limited","Evaluation may inadvertently provide attack templates; requires careful handling and restricted distribution"],"requires":["Python 3.8+","Access to restricted CyberSecEval v3 autonomous operations benchmarks (may require approval)","Security expertise to interpret results responsibly","Isolated evaluation environment to prevent accidental capability leakage"],"input_types":["multi-step cyber attack scenarios","simulated network environments and target systems"],"output_types":["attack success rate (percentage of scenarios where model executed successful multi-step attacks)","attack sophistication metrics (number of steps, lateral movement capability)","capability classification (e.g., 'reconnaissance-only', 'exploitation-capable', 'autonomous-agent')"],"categories":["safety-moderation","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_2","uri":"capability://safety.moderation.multilingual.safety.classification.with.machine.translated.benchmarks","name":"multilingual safety classification with machine-translated benchmarks","description":"Llama Guard extends safety classification across multiple languages by leveraging machine-translated versions of safety evaluation datasets (e.g., MITRE prompts translated to 10+ languages). The model is evaluated and can be fine-tuned on these multilingual variants to detect unsafe content regardless of input language. This capability is integrated into CyberSecEval's benchmark suite which includes multilingual prompt injection and MITRE compliance test sets.","intents":["I need to moderate user content in multiple languages without deploying separate models per language","I want to ensure my safety policies apply consistently across global users regardless of language","I need to evaluate whether my LLM is equally robust to safety attacks in non-English languages"],"best_for":["Global platforms serving users in 10+ languages","Organizations with regulatory requirements across multiple jurisdictions","Teams evaluating LLM safety across language boundaries"],"limitations":["Machine translation introduces noise; some safety-critical nuances may be lost in translation","Performance varies significantly by language — high-resource languages (Spanish, French) perform better than low-resource languages","Requires evaluation on language-specific test sets to validate performance; generic multilingual models may not generalize","No support for code-mixed prompts (e.g., Hinglish, Spanglish) which are common in real-world deployments"],"requires":["Llama Guard model weights","CyberSecEval multilingual benchmark datasets (mitre_prompts_multilingual_machine_translated.json)","Python 3.8+"],"input_types":["text in any language (English, Spanish, French, German, Chinese, Japanese, etc.)"],"output_types":["safety classification with language-specific confidence scores","per-language category judgments"],"categories":["safety-moderation","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_3","uri":"capability://safety.moderation.integration.with.llamafirewall.security.orchestration.framework","name":"integration with llamafirewall security orchestration framework","description":"Llama Guard integrates as a core component within the LlamaFirewall security framework, which orchestrates multiple scanner components (Llama Guard, Prompt Guard, CodeShield) into a unified input/output filtering pipeline. LlamaFirewall provides the orchestration layer that chains Llama Guard's classification results with other security scanners, applies policy decisions, and manages the flow of requests through the security stack. This enables teams to compose multi-stage security workflows where Llama Guard handles general content safety while specialized scanners handle code security or prompt injection.","intents":["I need to build a multi-stage security pipeline that combines general content filtering with specialized code and prompt security checks","I want to apply different security scanners to different parts of my LLM pipeline (input vs output, code vs text)","I need to orchestrate security decisions across multiple models and make policy enforcement decisions based on combined signals"],"best_for":["Teams building production LLM systems with complex security requirements","Organizations needing to compose multiple specialized safety models","Developers who want modular, composable security rather than monolithic solutions"],"limitations":["LlamaFirewall adds orchestration overhead (~50-100ms per request for multi-stage pipelines)","Requires managing multiple models in production; increases operational complexity and resource requirements","Policy composition logic must be defined explicitly; no automatic conflict resolution between scanner outputs","Limited to scanners within the PurpleLlama ecosystem; integrating third-party safety tools requires custom adapters"],"requires":["LlamaFirewall framework (part of PurpleLlama)","Llama Guard model weights","Python 3.8+","Additional scanner models (Prompt Guard, CodeShield) for full pipeline"],"input_types":["text (user prompts, model responses)","code (for CodeShield integration)"],"output_types":["orchestrated security decision (allow/block/flag)","combined risk assessment from multiple scanners","audit logs with per-scanner decisions"],"categories":["safety-moderation","tool-use-integration","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_4","uri":"capability://safety.moderation.cybersecurity.benchmark.evaluation.and.red.teaming.integration","name":"cybersecurity benchmark evaluation and red-teaming integration","description":"Llama Guard serves as both a subject of evaluation within CyberSecEval's comprehensive cybersecurity benchmark suite and as a tool for evaluating other LLMs. The framework includes structured benchmarks for prompt injection, MITRE compliance, code interpreter abuse, and autonomous offensive cyber operations. Teams can use Llama Guard to classify LLM responses in these benchmarks, measuring how well their models resist adversarial attacks. The integration with CyberSecEval v1/v2/v3 provides standardized evaluation protocols and datasets for red-teaming LLM deployments.","intents":["I need to evaluate how vulnerable my LLM is to cybersecurity attacks using industry-standard benchmarks","I want to measure my LLM's false refusal rate (FRR) — how often it incorrectly refuses legitimate requests","I need to red-team my LLM against prompt injection, code interpreter abuse, and autonomous attack scenarios"],"best_for":["Security researchers evaluating LLM robustness","Teams conducting pre-deployment security audits","Organizations required to demonstrate security compliance (e.g., financial services, healthcare)","Developers building red-teaming infrastructure"],"limitations":["Benchmarks are static; adversarial techniques evolve faster than benchmark updates","Evaluation results are model-specific; performance on CyberSecEval doesn't guarantee safety in production","Requires significant computational resources to run full benchmark suites (hours to days on GPU clusters)","Some benchmarks (e.g., autonomous offensive cyber operations) may be sensitive; not suitable for public disclosure"],"requires":["CyberSecEval benchmark datasets and evaluation framework","Python 3.8+","LLM provider API keys (OpenAI, Anthropic, Together, Google GenAI) or local model weights","GPU cluster for large-scale evaluations (optional but recommended)"],"input_types":["LLM responses to benchmark prompts","structured benchmark datasets (MITRE, prompt injection, code interpreter, etc.)"],"output_types":["per-benchmark safety scores","false refusal rate (FRR) metrics","vulnerability exploitation success rates","compliance reports against MITRE frameworks"],"categories":["safety-moderation","planning-reasoning","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_5","uri":"capability://safety.moderation.per.category.risk.scoring.and.policy.threshold.customization","name":"per-category risk scoring and policy threshold customization","description":"Llama Guard produces granular per-category risk scores (e.g., violence: 0.8, sexual content: 0.2, criminal planning: 0.1) rather than a single binary safe/unsafe judgment. Teams can define custom policy thresholds per category, allowing fine-grained enforcement where some categories are blocked at high confidence while others permit lower thresholds. This is implemented through the model's output layer which produces logits for each safety category, enabling downstream policy engines to apply category-specific rules.","intents":["I need different safety thresholds for different content types — block violence strictly but allow some sexual content for adult users","I want to log and monitor which safety categories are most frequently triggered in my user base","I need to apply different policies for different user segments or use cases without retraining the model"],"best_for":["Teams with nuanced safety requirements that don't fit binary safe/unsafe","Platforms serving diverse user demographics with different content tolerances","Organizations needing to adjust safety policies without model retraining"],"limitations":["Category definitions are fixed to Llama Guard's training taxonomy; cannot add custom categories without retraining","Per-category thresholds must be tuned empirically; no principled method for setting optimal thresholds","Threshold tuning requires labeled validation data; teams must manually review false positives/negatives per category","No built-in support for category interactions (e.g., violence + sexual content may require stricter combined threshold)"],"requires":["Llama Guard model weights","Policy definition framework (custom code or LlamaFirewall integration)","Validation dataset for threshold tuning"],"input_types":["text (user prompts, model responses)"],"output_types":["per-category risk scores (0.0-1.0 range)","category-level judgments (safe/unsafe per category)","policy enforcement decisions based on custom thresholds"],"categories":["safety-moderation","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_6","uri":"capability://safety.moderation.local.inference.with.no.external.api.dependencies","name":"local inference with no external api dependencies","description":"Llama Guard runs entirely locally on customer infrastructure without requiring external API calls or data transmission to Meta or third-party services. The model weights are open-source and can be downloaded and deployed on private servers, VPCs, or air-gapped environments. This architecture eliminates latency from network round-trips and provides full data privacy — safety classifications never leave the customer's infrastructure.","intents":["I need to classify content without sending user data to external APIs for privacy or compliance reasons","I want to deploy safety filtering in air-gapped or offline environments","I need sub-100ms latency for real-time safety classification in my LLM pipeline"],"best_for":["Organizations with strict data privacy requirements (HIPAA, GDPR, financial services)","Teams deploying in air-gapped or offline environments","Developers optimizing for latency-sensitive applications"],"limitations":["Requires GPU or substantial CPU resources; cannot run efficiently on edge devices or mobile","Teams must manage model updates and security patches independently","No built-in monitoring or analytics dashboard; requires custom logging infrastructure","Scaling to high throughput requires managing multiple model replicas and load balancing"],"requires":["Llama Guard model weights (7B: ~14GB, 13B: ~26GB disk space)","GPU with 8GB+ VRAM (7B) or 16GB+ (13B), or CPU with 32GB+ RAM","Python 3.8+, PyTorch 1.13+, Transformers 4.30+","Network access to download model weights (one-time)"],"input_types":["text (user prompts, model responses)"],"output_types":["safety classification (local JSON response)"],"categories":["safety-moderation","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_7","uri":"capability://safety.moderation.code.security.evaluation.via.codeshield.integration","name":"code security evaluation via codeshield integration","description":"Llama Guard integrates with CodeShield, a specialized safety model for evaluating code security risks in LLM-generated code. While Llama Guard handles general content safety, CodeShield specifically detects insecure code patterns, vulnerable dependencies, and code interpreter abuse. The integration within LlamaFirewall allows teams to apply CodeShield to code outputs while using Llama Guard for text outputs, creating a unified security pipeline that handles both modalities.","intents":["I need to detect when my LLM generates insecure code (SQL injection, hardcoded credentials, unsafe deserialization)","I want to prevent code interpreter abuse where users trick my LLM into executing malicious code","I need to evaluate my LLM's code generation safety across different programming languages"],"best_for":["Teams deploying code-generation LLMs (GitHub Copilot-like systems)","Organizations with code interpreter or notebook environments","Developers building secure coding assistants"],"limitations":["CodeShield is a separate model; requires additional infrastructure and inference overhead","Code security evaluation is language-specific; performance varies by programming language","Cannot detect all security vulnerabilities; relies on patterns learned during training","No integration with static analysis tools (SAST) or dependency vulnerability databases"],"requires":["CodeShield model weights (part of PurpleLlama)","Llama Guard model weights","LlamaFirewall orchestration framework","Python 3.8+"],"input_types":["code (Python, JavaScript, Java, C++, etc.)"],"output_types":["code security classification (safe/unsafe)","per-vulnerability-type risk scores","specific insecure patterns detected"],"categories":["safety-moderation","code-generation-editing"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_8","uri":"capability://safety.moderation.false.refusal.rate.frr.measurement.and.mitre.compliance.evaluation","name":"false refusal rate (frr) measurement and mitre compliance evaluation","description":"Llama Guard integrates with CyberSecEval's MITRE compliance benchmarks to measure false refusal rates (FRR) — the percentage of legitimate, safe requests that are incorrectly blocked. The framework includes MITRE-mapped test cases that represent legitimate use cases within security domains (e.g., educational content about vulnerabilities, authorized penetration testing). Teams can evaluate their LLM's FRR to ensure safety policies don't over-block legitimate requests, balancing safety with usability.","intents":["I need to measure how often my safety policies incorrectly block legitimate requests (false refusal rate)","I want to ensure my LLM can still answer legitimate security questions (e.g., how to patch a vulnerability)","I need to demonstrate MITRE compliance for security-related content in my LLM"],"best_for":["Security researchers and red-teamers","Teams building security-focused LLMs (threat intelligence, vulnerability research)","Organizations required to demonstrate balanced safety/usability trade-offs"],"limitations":["FRR measurement requires manual annotation of legitimate vs illegitimate requests; labor-intensive","MITRE mappings are subjective; different teams may disagree on whether a request is legitimate","FRR varies significantly by domain; benchmarks may not cover your specific use case","No automated method to reduce FRR without retraining; requires policy threshold adjustment or model fine-tuning"],"requires":["CyberSecEval MITRE compliance benchmark datasets","Llama Guard model weights","Python 3.8+","Labeled validation data for your specific domain (optional)"],"input_types":["MITRE-mapped test prompts (legitimate security-related requests)"],"output_types":["false refusal rate (FRR) percentage","per-MITRE-category FRR scores","confusion matrix (true positives, false positives, true negatives, false negatives)"],"categories":["safety-moderation","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__cap_9","uri":"capability://safety.moderation.visual.prompt.injection.detection.via.prompt.guard.integration","name":"visual prompt injection detection via prompt guard integration","description":"Llama Guard integrates with Prompt Guard, a specialized model for detecting visual prompt injection attacks where adversaries embed text instructions in images to manipulate LLM behavior. While Llama Guard handles text-based attacks, Prompt Guard processes image inputs to detect embedded instructions. The integration within LlamaFirewall allows teams to apply Prompt Guard to multimodal inputs (text + images) alongside Llama Guard's text classification.","intents":["I need to detect when users embed malicious text instructions in images to bypass my safety filters","I want to protect my multimodal LLM from visual prompt injection attacks","I need to evaluate my LLM's robustness to adversarial images containing hidden instructions"],"best_for":["Teams deploying multimodal LLMs (vision + language models)","Organizations accepting user-uploaded images","Developers building robust vision-language systems"],"limitations":["Prompt Guard is a separate model; requires additional inference overhead for image processing","Visual prompt injection is an emerging attack; detection techniques are still evolving","Cannot detect all image-based attacks; adversaries may find new evasion techniques","Requires image preprocessing and OCR-like capabilities; adds computational cost"],"requires":["Prompt Guard model weights (part of PurpleLlama)","Llama Guard model weights","LlamaFirewall orchestration framework","Python 3.8+","Image processing libraries (PIL, OpenCV)"],"input_types":["images (PNG, JPEG, etc.)","text (extracted from images or user-provided)"],"output_types":["visual injection detection (safe/unsafe)","extracted text from images","confidence scores for detected injections"],"categories":["safety-moderation","image-visual"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"llama-guard__headline","uri":"capability://safety.moderation.ai.safety.classifier.model","name":"ai safety classifier model","description":"Llama Guard is a customizable AI safety classifier model that evaluates user prompts and AI responses against safety policies, focusing on multi-category content classification such as violence and self-harm.","intents":["best AI safety classifier","AI safety model for content moderation","customizable safety classifier for AI","AI model for evaluating user prompts","best model for content classification"],"best_for":["developers needing content moderation tools","teams focusing on AI safety"],"limitations":[],"requires":[],"input_types":["user prompts","AI responses"],"output_types":["safety evaluations","classification results"],"categories":["safety-moderation"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":57,"verified":false,"data_access_risk":"high","permissions":["Llama Guard model weights (7B or 13B variant)","Python 3.8+","PyTorch 1.13+","Transformers library 4.30+","GPU with 8GB+ VRAM for 7B model, 16GB+ for 13B (CPU inference possible but slow)","Llama Guard model weights","Prompt injection test datasets (available in CyberSecEval benchmark)","Multimodal LLM supporting vision inputs","Visual prompt injection benchmark dataset (included in CyberSecEval v3+)","Image processing libraries (PIL, OpenCV)"],"failure_modes":["Classification latency adds ~50-200ms per inference depending on model size and hardware","Requires GPU or sufficient CPU resources for real-time inference; CPU-only deployment is slow","Training data reflects Meta's safety taxonomy; may not align perfectly with domain-specific harms (e.g., financial fraud, medical misinformation)","No built-in support for context-aware safety — treats each prompt/response independently without conversation history","Adversarial attacks evolve faster than model training cycles; zero-day injection techniques may bypass detection","No defense against visual prompt injection (images containing text instructions) — requires separate CodeShield model","Cannot distinguish between legitimate complex instructions and malicious injections in all cases","Performance degrades on prompts in languages not well-represented in training data","Visual injection attack evaluation is nascent; benchmark coverage is limited compared to text-based attacks","Requires image generation or curation for attack dataset; more resource-intensive than text-only benchmarks","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.7,"quality":0.9,"ecosystem":0.39999999999999997,"match_graph":0.25,"freshness":0.52,"weights":{"adoption":0.35,"quality":0.2,"ecosystem":0.1,"match_graph":0.3,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-06-17T09:51:04.692Z","last_scraped_at":null,"last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=llama-guard","compare_url":"https://unfragile.ai/compare?artifact=llama-guard"}},"signature":"BKvHUaez9gm0O5mFt/sTLiE27mtVcF7uZ1RTHYIhDgVbLOPq952wFJdLzKLV5ZkiTacrjgZcnbHTj48VMkRsDw==","signedAt":"2026-06-21T16:01:59.454Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/llama-guard","artifact":"https://unfragile.ai/llama-guard","verify":"https://unfragile.ai/api/v1/verify?slug=llama-guard","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}