{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"awesome-mistral-inference","slug":"mistral-inference","name":"mistral-inference","type":"repo","url":"https://github.com/mistralai/mistral-inference","page_url":"https://unfragile.ai/mistral-inference","categories":["model-training"],"tags":[],"pricing":{"model":"open_source","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"awesome-mistral-inference__cap_0","uri":"capability://text.generation.language.multi.architecture.language.model.inference.with.transformer.and.state.space.model.support","name":"multi-architecture language model inference with transformer and state-space model support","description":"Executes inference across multiple model architectures (Transformer-based and Mamba state-space models) through a unified inference pipeline that handles tokenization, KV caching, and generation. The system abstracts architecture differences behind a common interface, allowing seamless switching between Mistral 7B, Mixtral 8x7B/8x22B (mixture-of-experts), Mamba 7B, and other variants without code changes. KV cache management optimizes memory usage during autoregressive generation by storing computed key-value pairs rather than recomputing them at each step.","intents":["Run different Mistral model variants locally without rewriting inference code","Switch between transformer and state-space architectures for latency/memory tradeoffs","Optimize inference memory footprint using KV caching for long-context generation"],"best_for":["ML engineers deploying Mistral models in resource-constrained environments","Researchers comparing transformer vs state-space model performance","Teams building multi-model applications requiring architecture flexibility"],"limitations":["KV cache memory grows linearly with sequence length — no built-in cache eviction or quantization for very long contexts (>32K tokens)","Mamba models lack attention mechanism, limiting interpretability and some downstream task performance vs transformers","Single-GPU inference for models >7B requires manual distributed setup with torchrun; no automatic sharding"],"requires":["Python 3.9+","PyTorch 2.0+","CUDA 11.8+ (for GPU inference) or CPU fallback with significant latency","Model weights from Hugging Face Hub or local filesystem"],"input_types":["text prompts (string)","tokenized sequences (torch.Tensor)","multimodal inputs (text + image for Pixtral models)"],"output_types":["generated text tokens (torch.Tensor)","decoded text strings","logits for downstream processing"],"categories":["text-generation-language","model-inference"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_1","uri":"capability://image.visual.multimodal.inference.with.vision.encoder.integration.for.text.image.understanding","name":"multimodal inference with vision encoder integration for text-image understanding","description":"Processes multimodal inputs (text + images) by routing images through a dedicated vision encoder that extracts visual embeddings, then concatenates them with text token embeddings before passing through the language model decoder. The vision encoder (used in Pixtral 12B and Pixtral Large) converts image pixels to a sequence of visual tokens that the LLM can attend to, enabling tasks like image captioning, visual question answering, and image-based reasoning. The system handles image preprocessing (resizing, normalization) and token alignment automatically.","intents":["Build image understanding applications without separate vision-language model orchestration","Process mixed text-image prompts in a single forward pass","Perform visual reasoning tasks (VQA, captioning) with Mistral's language understanding"],"best_for":["Teams building document understanding or visual search applications","Developers prototyping multimodal chatbots with local inference","Researchers studying vision-language model scaling with open weights"],"limitations":["Vision encoder is fixed (not trainable in base inference) — fine-tuning vision components requires separate LoRA setup","Image resolution limited by model architecture (typically 336x336 or 672x672) — high-resolution images are downsampled, losing fine detail","Multimodal inference adds ~500ms-1s latency per image due to vision encoder forward pass; no batching across multiple images in single request"],"requires":["Python 3.9+","PyTorch 2.0+","Pixtral 12B or Pixtral Large model weights","PIL/Pillow for image preprocessing"],"input_types":["text prompts (string)","images (PIL.Image, numpy array, or file path)","mixed text-image sequences"],"output_types":["generated text with visual grounding","structured responses (JSON) from image analysis"],"categories":["image-visual","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_10","uri":"capability://automation.workflow.docker.containerization.and.vllm.integration.for.production.deployment","name":"docker containerization and vllm integration for production deployment","description":"Provides Docker container templates and integration with vLLM (a high-performance inference engine) for production-grade deployment. The system includes Dockerfile configurations for packaging Mistral models with all dependencies, enabling reproducible deployment across environments. vLLM integration enables batching, request queuing, and optimized KV cache management for serving multiple concurrent requests with higher throughput than single-request inference. The deployment setup handles model weight downloading, GPU resource allocation, and port exposure for API access.","intents":["Deploy Mistral models as containerized services with reproducible environments","Serve multiple concurrent inference requests with batching and request queuing","Scale inference across multiple containers or GPUs using vLLM"],"best_for":["DevOps teams deploying Mistral models to Kubernetes or Docker Swarm","Organizations needing production-grade inference with SLAs","Teams building inference APIs with high concurrency requirements"],"limitations":["vLLM integration requires separate vLLM installation — adds complexity and potential version conflicts","Docker images are large (~10-20GB with model weights) — slow to build and push to registries","No built-in load balancing across containers — requires external orchestration (Kubernetes, Docker Swarm)","GPU resource allocation must be specified manually — no automatic GPU detection in containers"],"requires":["Docker 20.10+","Docker Compose (optional, for multi-container setups)","GPU support (nvidia-docker or similar)","vLLM library (for high-performance serving)","Model weights (downloaded during container build or mounted as volume)"],"input_types":["Dockerfile configuration","model weights","environment variables (GPU allocation, port, etc.)"],"output_types":["Docker image","running container with inference API","metrics and logs"],"categories":["automation-workflow","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_11","uri":"capability://text.generation.language.generation.parameter.control.with.temperature.top.p.and.max.tokens.sampling","name":"generation parameter control with temperature, top-p, and max-tokens sampling","description":"Provides fine-grained control over text generation behavior through sampling parameters: temperature (controls randomness), top-p (nucleus sampling for diversity), top-k (restricts to top-k tokens), and max_tokens (limits output length). These parameters are applied during the decoding phase to shape the probability distribution over next tokens, enabling control over output creativity vs determinism. The system supports both greedy decoding (argmax) and stochastic sampling, with proper handling of edge cases (temperature=0, top-p=1.0).","intents":["Control output creativity and determinism via temperature and sampling parameters","Limit output length to prevent runaway generation","Implement different generation strategies (greedy, diverse, constrained) for different use cases"],"best_for":["Developers fine-tuning model behavior for specific applications (chatbots, code generation, creative writing)","Researchers studying sampling strategies and their effects on output quality","Teams implementing multi-strategy generation for A/B testing"],"limitations":["Parameter tuning is empirical — no principled guidance on optimal values for different tasks","Temperature scaling is global — cannot vary temperature per token or per generation step","top-p and top-k are applied independently — no interaction modeling between them","No adaptive stopping criteria — max_tokens is fixed, may truncate meaningful output"],"requires":["Python 3.9+","Understanding of sampling strategies and their effects","Model weights and tokenizer"],"input_types":["text prompt (string)","generation parameters (temperature, top_p, top_k, max_tokens)"],"output_types":["generated text (string)","token sequences with probabilities (optional)"],"categories":["text-generation-language","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_12","uri":"capability://text.generation.language.streaming.text.generation.with.token.by.token.output","name":"streaming text generation with token-by-token output","description":"Generates text incrementally, yielding tokens one at a time as they are produced rather than waiting for the entire sequence to complete. This enables real-time output display in chat interfaces and reduces perceived latency by showing partial results immediately. The streaming implementation maintains generation state (KV cache, attention masks) across token yields, enabling efficient incremental generation without recomputation. Streaming is compatible with all generation parameters (temperature, top-p, etc.) and works with both text-only and multimodal inputs.","intents":["Display model output in real-time as tokens are generated (chat interfaces, web UIs)","Reduce perceived latency by showing partial results immediately","Enable user interruption of generation mid-stream"],"best_for":["Developers building interactive chat applications and web UIs","Teams implementing real-time inference APIs with streaming responses","Researchers studying streaming generation behavior and latency"],"limitations":["Streaming adds minimal overhead but prevents batching optimizations — single-request streaming is slower than batched non-streaming","Token-by-token output may have variable latency per token — no guaranteed latency bounds","Streaming state must be maintained across yields — interrupting generation mid-stream requires careful cleanup","No built-in buffering — very fast generation may produce tokens faster than client can consume"],"requires":["Python 3.9+","Model weights and tokenizer","Client capable of consuming streaming output (HTTP chunked encoding, WebSocket, etc.)"],"input_types":["text prompt (string)","generation parameters"],"output_types":["token iterator (yields strings or token IDs)","streaming HTTP response (for API usage)"],"categories":["text-generation-language","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_2","uri":"capability://tool.use.integration.function.calling.with.schema.based.tool.invocation.and.structured.output.generation","name":"function calling with schema-based tool invocation and structured output generation","description":"Enables models to generate structured function calls by defining tool schemas (name, description, parameters) that the model learns to invoke during generation. The system constrains the model's output to valid function call syntax, allowing it to request external tool execution (API calls, database queries, code execution). The model generates function names and arguments as structured JSON, which the application parses and executes, then feeds results back to the model for continued reasoning. This creates an agentic loop where the model can decompose tasks into tool-assisted steps.","intents":["Build AI agents that can call external APIs, databases, or code execution environments","Enable models to perform multi-step reasoning by invoking tools and using results","Create structured output from language models without post-hoc parsing"],"best_for":["Developers building autonomous agents with local Mistral models","Teams creating chatbots that need to access real-time data or perform actions","Researchers studying tool-use in language models with open-weight models"],"limitations":["Function calling requires explicit schema definition — no automatic schema inference from Python functions","Model may hallucinate function calls not in the schema or generate malformed JSON; no built-in validation or retry logic","Smaller models (7B) have lower accuracy in complex multi-step function calling vs larger models; no fine-tuning guidance provided","No streaming support for function calls — entire call must be generated before parsing"],"requires":["Python 3.9+","Model weights for any Mistral variant (all support function calling)","JSON schema definition for tools","Application-level tool execution framework (not provided)"],"input_types":["text prompts with tool descriptions","tool schemas (JSON schema format)","previous tool execution results (for multi-turn reasoning)"],"output_types":["function call JSON (name + arguments)","text responses interspersed with function calls"],"categories":["tool-use-integration","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_3","uri":"capability://code.generation.editing.fill.in.the.middle.code.completion.with.bidirectional.context","name":"fill-in-the-middle code completion with bidirectional context","description":"Generates code snippets in the middle of a file by conditioning on both prefix (code before the cursor) and suffix (code after the cursor) context. Unlike standard left-to-right generation, FIM uses a special token structure where the model learns to generate the missing middle section given both directions of context. This is particularly useful for code editors and IDEs where developers want completions that respect existing code structure. The model uses a FIM-specific prompt format that signals to generate the middle portion rather than continuing from the end.","intents":["Provide IDE-integrated code completions that respect code structure on both sides of cursor","Generate function bodies given function signatures and usage context","Complete code in the middle of files without disrupting existing code"],"best_for":["IDE plugin developers integrating Mistral with code editors (VS Code, JetBrains)","Teams building internal code completion tools for specific codebases","Developers using Codestral (code-specialized Mistral variant) for code-heavy workflows"],"limitations":["FIM performance degrades with very long suffix context (>2K tokens) — model may ignore suffix and generate left-to-right","Requires explicit FIM prompt format; standard chat/instruction prompts don't trigger FIM behavior","No streaming support for FIM — entire completion must be generated before returning to editor","Limited to code-specialized models (Codestral) for best results; general models have lower FIM accuracy"],"requires":["Python 3.9+","Codestral 22B or Mamba 7B model weights (optimized for FIM)","FIM-aware prompt formatting (special tokens: <|fim_prefix|>, <|fim_middle|>, <|fim_suffix|>)"],"input_types":["code prefix (string)","code suffix (string)","file context (optional)"],"output_types":["generated code snippet for middle section","confidence scores (optional)"],"categories":["code-generation-editing","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_4","uri":"capability://code.generation.editing.low.rank.adaptation.fine.tuning.with.lora.parameter.efficient.training","name":"low-rank adaptation fine-tuning with lora parameter-efficient training","description":"Enables efficient model fine-tuning by training only low-rank adapter matrices (LoRA) instead of full model weights, reducing trainable parameters by 99%+ while maintaining performance. The system freezes the base model weights and adds small trainable matrices (rank typically 8-64) that are applied via matrix multiplication during forward passes. LoRA adapters can be saved separately (~10-100MB per adapter) and composed with the base model at inference time, enabling multiple task-specific adapters without duplicating model weights. The implementation integrates with PyTorch's distributed training for multi-GPU fine-tuning.","intents":["Fine-tune Mistral models on custom datasets without GPU memory constraints of full fine-tuning","Create multiple task-specific adapters that share a base model","Adapt models to domain-specific language (medical, legal, code) with minimal compute"],"best_for":["Teams with limited GPU memory (single GPU fine-tuning possible with LoRA vs requiring 8+ GPUs for full fine-tuning)","Organizations needing multiple specialized model variants without storage overhead","Researchers studying parameter efficiency in large language models"],"limitations":["LoRA rank selection requires manual tuning — no automated rank selection; typical ranks (8-64) may be suboptimal for some tasks","Adapter composition at inference adds ~5-10% latency per adapter due to matrix multiplication overhead","No built-in support for adapter merging into base weights — requires external tools or custom code","Training stability requires careful learning rate tuning; no adaptive learning rate scheduling provided"],"requires":["Python 3.9+","PyTorch 2.0+ with CUDA support","peft library (Parameter-Efficient Fine-Tuning) for LoRA implementation","Training dataset in text or instruction-following format","GPU with minimum 8GB VRAM (16GB+ recommended for larger models)"],"input_types":["training dataset (text files, JSONL, or HuggingFace datasets)","LoRA configuration (rank, alpha, target modules)","base model weights"],"output_types":["LoRA adapter weights (safetensors format)","training metrics (loss, perplexity)","merged model (optional)"],"categories":["code-generation-editing","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_5","uri":"capability://automation.workflow.command.line.interface.for.interactive.chat.and.model.testing","name":"command-line interface for interactive chat and model testing","description":"Provides two CLI tools (mistral-chat and mistral-demo) for running models without writing code. mistral-chat enables interactive multi-turn conversations with streaming output, while mistral-demo is optimized for quick testing of model capabilities. Both tools handle model loading, tokenization, and generation automatically, with support for specifying model variants, temperature, max tokens, and other generation parameters via command-line flags. The CLI abstracts GPU/CPU device selection and distributed inference setup (torchrun) for multi-GPU scenarios.","intents":["Test Mistral models quickly without writing Python code","Run interactive chat sessions with streaming responses","Benchmark model performance and latency from command line"],"best_for":["ML engineers and researchers prototyping with Mistral models","Non-technical users wanting to interact with models locally","DevOps teams testing models before containerization"],"limitations":["CLI lacks advanced features like batch processing, multi-turn context management beyond simple conversation history","No built-in logging or metrics collection — requires manual output redirection for benchmarking","Multi-GPU setup requires manual torchrun invocation; no automatic device detection or load balancing","Streaming output may have buffering delays on some terminals"],"requires":["Python 3.9+","mistral-inference package installed (pip install mistral-inference)","Model weights downloaded to local filesystem or accessible via Hugging Face Hub","CUDA 11.8+ for GPU inference (optional, CPU fallback available)"],"input_types":["text prompts (stdin or command-line arguments)","model name (string identifier)"],"output_types":["generated text (stdout with streaming)","timing metrics (stderr)"],"categories":["automation-workflow","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_6","uri":"capability://text.generation.language.python.api.for.programmatic.model.instantiation.and.inference.control","name":"python api for programmatic model instantiation and inference control","description":"Exposes a Python API for direct model instantiation, configuration, and inference without CLI overhead. Developers can load models, configure generation parameters (temperature, top-p, max tokens), and run inference in a single Python process with full control over input/output handling. The API supports both synchronous generation and streaming output, enabling integration into applications, notebooks, and frameworks. Model configuration is handled through dataclass-based config objects that map to model architecture parameters, enabling fine-grained control over model behavior.","intents":["Integrate Mistral models into Python applications and frameworks","Build custom inference loops with full control over generation parameters","Run models in Jupyter notebooks for research and prototyping"],"best_for":["Python developers building LLM applications with Mistral models","Researchers implementing custom inference algorithms","Teams integrating Mistral into existing Python codebases"],"limitations":["API is synchronous — no built-in async/await support for concurrent requests","No request batching across multiple prompts — each inference call is sequential","Model loading is slow (~10-30s for 7B models) — no model caching across API calls in same process","Limited error handling — model failures (OOM, CUDA errors) propagate as raw exceptions"],"requires":["Python 3.9+","PyTorch 2.0+","mistral-inference package","Model weights accessible locally or via Hugging Face Hub"],"input_types":["text prompts (string)","generation parameters (temperature, top_p, max_tokens, etc.)","model configuration (ModelArgs dataclass)"],"output_types":["generated text (string)","token sequences (torch.Tensor)","streaming iterators (for streaming output)"],"categories":["text-generation-language","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_7","uri":"capability://automation.workflow.distributed.inference.across.multiple.gpus.with.torchrun.orchestration","name":"distributed inference across multiple gpus with torchrun orchestration","description":"Enables inference on models larger than single-GPU memory by distributing computation across multiple GPUs using PyTorch's distributed data parallel (DDP) or tensor parallel approaches. The system integrates with torchrun to handle process spawning, rank assignment, and communication backend setup automatically. Developers specify the number of GPUs via torchrun flags, and the inference pipeline automatically partitions model layers or attention heads across devices, with inter-GPU communication handled transparently via NCCL.","intents":["Run large models (Mixtral 8x22B, Mistral Large) that exceed single-GPU memory","Distribute inference load across multiple GPUs for lower latency","Deploy models on multi-GPU servers without custom distributed code"],"best_for":["Teams with multi-GPU infrastructure (2+ GPUs) deploying large Mistral models","Researchers benchmarking model performance across different GPU counts","Production deployments requiring high throughput on large models"],"limitations":["Inter-GPU communication overhead (NCCL) adds 10-30% latency vs single-GPU inference for small batches","Requires manual torchrun invocation — no automatic GPU detection or load balancing","Tensor parallelism requires careful layer partitioning — no automatic optimal partition strategy","Debugging distributed inference is complex — NCCL errors are cryptic and hard to diagnose"],"requires":["Python 3.9+","PyTorch 2.0+ with NCCL support","Multiple GPUs (2+) with NVLink or high-bandwidth interconnect (PCIe 4.0+)","torchrun command-line tool (included with PyTorch)","CUDA 11.8+ with proper driver support for multi-GPU communication"],"input_types":["text prompts (string)","model weights (distributed across GPUs)"],"output_types":["generated text (string)","aggregated from all GPUs"],"categories":["automation-workflow","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_8","uri":"capability://automation.workflow.model.configuration.and.architecture.parameter.management","name":"model configuration and architecture parameter management","description":"Manages model architecture parameters (hidden size, number of layers, attention heads, vocabulary size, etc.) through dataclass-based configuration objects (ModelArgs) that define the complete model structure. Configuration is loaded from model-specific JSON files or defined programmatically, enabling support for different model variants (7B, 22B, MoE, etc.) without code changes. The system validates configuration consistency and maps parameters to the appropriate model architecture (Transformer vs Mamba) during instantiation.","intents":["Load and configure different Mistral model variants without code changes","Define custom model architectures by modifying configuration parameters","Validate model configuration before instantiation to catch errors early"],"best_for":["Researchers experimenting with different model architectures and sizes","Teams managing multiple model variants in production","Developers building custom models based on Mistral architecture"],"limitations":["Configuration validation is minimal — invalid parameter combinations may only fail during model instantiation","No configuration versioning or migration support — changes to ModelArgs may break existing configs","Limited documentation on parameter interactions — some parameters (e.g., rope_theta, moe_intermediate_size) lack clear guidance"],"requires":["Python 3.9+","Model configuration JSON or ModelArgs dataclass definition","Understanding of transformer/Mamba architecture parameters"],"input_types":["JSON configuration files","ModelArgs dataclass instances","command-line parameter overrides"],"output_types":["validated ModelArgs objects","model instantiation parameters"],"categories":["automation-workflow","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-mistral-inference__cap_9","uri":"capability://data.processing.analysis.tokenization.and.encoding.with.model.specific.vocabulary.handling","name":"tokenization and encoding with model-specific vocabulary handling","description":"Handles text-to-token conversion using model-specific tokenizers (typically Tiktoken or Sentencepiece-based) that map text to integer token IDs. The system manages vocabulary loading, special token handling (BOS, EOS, padding), and encoding/decoding with proper handling of edge cases (unknown tokens, multi-byte characters). Tokenization is integrated into the inference pipeline to ensure consistency between training and inference token boundaries.","intents":["Convert text prompts to token sequences for model input","Decode model output tokens back to readable text","Manage special tokens and vocabulary boundaries correctly"],"best_for":["Developers building inference pipelines that need token-level control","Researchers analyzing model tokenization behavior","Teams implementing custom generation algorithms"],"limitations":["Tokenizer is model-specific — cannot mix tokenizers across different model families","No streaming tokenization — entire input must be tokenized before inference","Special token handling is implicit — no explicit control over BOS/EOS insertion","Vocabulary size is fixed at model training time — no dynamic vocabulary expansion"],"requires":["Python 3.9+","Tokenizer files (typically included with model weights)","Text input in UTF-8 encoding"],"input_types":["text strings (UTF-8)","token sequences (for decoding)"],"output_types":["token IDs (list of integers)","decoded text strings"],"categories":["data-processing-analysis","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":28,"verified":false,"data_access_risk":"high","permissions":["Python 3.9+","PyTorch 2.0+","CUDA 11.8+ (for GPU inference) or CPU fallback with significant latency","Model weights from Hugging Face Hub or local filesystem","Pixtral 12B or Pixtral Large model weights","PIL/Pillow for image preprocessing","Docker 20.10+","Docker Compose (optional, for multi-container setups)","GPU support (nvidia-docker or similar)","vLLM library (for high-performance serving)"],"failure_modes":["KV cache memory grows linearly with sequence length — no built-in cache eviction or quantization for very long contexts (>32K tokens)","Mamba models lack attention mechanism, limiting interpretability and some downstream task performance vs transformers","Single-GPU inference for models >7B requires manual distributed setup with torchrun; no automatic sharding","Vision encoder is fixed (not trainable in base inference) — fine-tuning vision components requires separate LoRA setup","Image resolution limited by model architecture (typically 336x336 or 672x672) — high-resolution images are downsampled, losing fine detail","Multimodal inference adds ~500ms-1s latency per image due to vision encoder forward pass; no batching across multiple images in single request","vLLM integration requires separate vLLM installation — adds complexity and potential version conflicts","Docker images are large (~10-20GB with model weights) — slow to build and push to registries","No built-in load balancing across containers — requires external orchestration (Kubernetes, Docker Swarm)","GPU resource allocation must be specified manually — no automatic GPU detection in containers","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.05,"quality":0.5,"ecosystem":0.39999999999999997,"match_graph":0.25,"freshness":0.52,"weights":{"adoption":0.3,"quality":0.2,"ecosystem":0.15,"match_graph":0.3,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-06-17T09:51:03.578Z","last_scraped_at":"2026-05-03T14:00:25.471Z","last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=mistral-inference","compare_url":"https://unfragile.ai/compare?artifact=mistral-inference"}},"signature":"ryz3t8PaqnJYuyd8qRymJ71FPkgz0O7YcmbdNqQxND+X80zSk10kLpBDSLKmiP0EoRH7Ges4vNlm6W5apOcjBw==","signedAt":"2026-06-20T07:34:48.717Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/mistral-inference","artifact":"https://unfragile.ai/mistral-inference","verify":"https://unfragile.ai/api/v1/verify?slug=mistral-inference","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}