{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"internlm","slug":"internlm","name":"InternLM","type":"model","url":"https://github.com/InternLM/InternLM","page_url":"https://unfragile.ai/internlm","categories":["model-training"],"tags":[],"pricing":{"model":"free","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"internlm__cap_0","uri":"capability://text.generation.language.multilingual.instruction.following.chat.with.200k.context.window","name":"multilingual instruction-following chat with 200k context window","description":"InternLM2.5 and InternLM2 chat models support conversational interactions across multiple languages with a 200K token context window, enabling long-form document analysis and multi-turn dialogue. The models are fine-tuned via supervised fine-tuning (SFT) on instruction-following datasets, allowing them to follow complex user directives while maintaining coherence across extended conversations. This is implemented through standard transformer decoder architecture with rotary position embeddings (RoPE) scaled for long-context handling.","intents":["Build multilingual chatbots that can process entire documents or long conversation histories","Deploy conversational AI that understands nuanced instructions across Chinese, English, and other languages","Create applications requiring sustained context over 50K+ token interactions"],"best_for":["Teams building multilingual customer support systems","Researchers working on long-context language understanding","Developers creating document-aware chatbots for enterprise use"],"limitations":["200K context window is supported but inference latency scales linearly with context length; practical throughput degrades significantly beyond 100K tokens","Chat models are instruction-tuned but may hallucinate on factual queries without retrieval augmentation","Multilingual performance varies by language; English and Chinese are primary, other languages have reduced quality"],"requires":["InternLM2.5-7B-Chat or InternLM2-20B-Chat model weights (7B or 20B parameter versions)","Inference framework: Hugging Face Transformers 4.30+, LMDeploy, vLLM, or Ollama","GPU with 16GB+ VRAM for 7B model, 40GB+ for 20B model (or quantization support)"],"input_types":["text (natural language instructions, documents, conversation history)"],"output_types":["text (natural language responses, structured outputs via prompting)"],"categories":["text-generation-language","multilingual-support"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_1","uri":"capability://planning.reasoning.deep.thinking.mode.for.complex.mathematical.and.logical.reasoning","name":"deep thinking mode for complex mathematical and logical reasoning","description":"InternLM3 introduces a specialized 'deep thinking mode' that enables the model to perform extended chain-of-thought reasoning for complex mathematical problems, logic puzzles, and multi-step reasoning tasks. This mode works by allowing the model to generate internal reasoning traces before producing final answers, implemented through a two-stage generation process: first generating hidden reasoning tokens (not shown to users), then producing the final response. The architecture uses a modified attention mechanism that allows the model to 'think' without token budget constraints on visible output.","intents":["Solve complex mathematical problems requiring step-by-step derivations","Debug logical errors in code or algorithms through systematic reasoning","Generate detailed explanations for technical concepts with rigorous justification"],"best_for":["Educational platforms teaching mathematics and logic","Research teams requiring rigorous reasoning for scientific problems","Developers building AI tutoring systems or code analysis tools"],"limitations":["Deep thinking mode increases latency by 3-5x compared to normal mode; not suitable for real-time applications","Reasoning quality depends on problem complexity; very simple queries may not benefit from extended thinking","Only available in InternLM3-8B-Instruct; not available in smaller 1.8B or larger 20B variants"],"requires":["InternLM3-8B-Instruct model weights","Inference framework supporting extended generation (LMDeploy with deep-thinking support, or custom implementation)","GPU with 16GB+ VRAM for inference"],"input_types":["text (mathematical problems, logic puzzles, reasoning-heavy queries)"],"output_types":["text (reasoning trace + final answer, or final answer only depending on mode)"],"categories":["planning-reasoning","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_10","uri":"capability://image.visual.multi.modal.capability.through.vision.language.integration.emerging","name":"multi-modal capability through vision-language integration (emerging)","description":"InternLM is expanding into multi-modal capabilities through integration with vision encoders, enabling models to process images alongside text. This is implemented by combining a vision encoder (e.g., CLIP-based) with the language model backbone, where images are encoded to visual tokens and concatenated with text tokens in the input sequence. The model learns to reason about both visual and textual information through instruction-tuning on image-text datasets. This enables applications like image captioning, visual question answering, and document understanding from scanned PDFs.","intents":["Build applications that understand both text and images (e.g., document analysis, visual QA)","Process scanned documents or screenshots with OCR and semantic understanding","Create multi-modal chatbots that can discuss images and text together"],"best_for":["Teams building document processing systems with visual understanding","Developers creating multi-modal chatbots or assistants","Researchers studying vision-language models"],"limitations":["Vision-language variants are still in development; fewer model sizes and variants available compared to text-only models","Image resolution is limited by token budget; high-resolution images must be downsampled or tiled","Vision encoder adds computational overhead; inference is slower than text-only models","Multi-modal instruction-tuning data is limited; performance on specialized visual tasks may be weaker than specialized vision models"],"requires":["InternLM vision-language model variant (e.g., InternLM-XComposer or similar)","Vision encoder weights (typically CLIP or similar)","Inference framework supporting multi-modal input (LMDeploy, Transformers with vision support)","GPU with 16GB+ VRAM for inference"],"input_types":["text (prompts, questions)","image (JPEG, PNG, or other formats)"],"output_types":["text (descriptions, answers, analysis)"],"categories":["image-visual","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_11","uri":"capability://automation.workflow.npu.neural.processing.unit.support.for.edge.deployment","name":"npu (neural processing unit) support for edge deployment","description":"InternLM provides support for deployment on NPUs (Neural Processing Units) such as Huawei Ascend, enabling efficient inference on edge devices and specialized hardware. This is implemented through model quantization (int8, int4) and NPU-specific optimization passes that convert standard transformer operations to NPU-native operations. The framework handles model compilation, memory management, and operator fusion for NPU targets. This enables deployment of InternLM models on edge devices with significantly reduced latency and power consumption compared to GPU inference.","intents":["Deploy InternLM models on edge devices or specialized NPU hardware for low-latency inference","Build applications requiring on-device AI without cloud connectivity","Optimize inference costs through specialized hardware acceleration"],"best_for":["Teams building edge AI applications (mobile, IoT, embedded systems)","Organizations with Huawei Ascend or other NPU hardware","Developers optimizing for latency-critical applications"],"limitations":["NPU support is limited to specific hardware (primarily Huawei Ascend); not universally available","Model quantization required for NPU deployment introduces quality degradation (2-5% on benchmarks)","NPU ecosystem is less mature than GPU; fewer optimization tools and debugging capabilities","Porting models to NPU requires specialized knowledge; not as straightforward as GPU deployment"],"requires":["NPU hardware (Huawei Ascend or compatible device)","NPU SDK and compiler toolchain (Huawei Ascend toolkit)","Quantized InternLM model weights (int8 or int4)","LMDeploy or custom inference framework with NPU backend support"],"input_types":["text (prompts for inference)"],"output_types":["text (model responses)"],"categories":["automation-workflow","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_12","uri":"capability://automation.workflow.model.conversion.and.format.transformation.tools","name":"model conversion and format transformation tools","description":"InternLM provides tools for converting models between different formats and frameworks, including conversion to ONNX, TensorRT, and other inference-optimized formats. The conversion pipeline handles weight transformation, operator mapping, and format-specific optimizations. This enables deployment of InternLM models in diverse inference environments (ONNX Runtime, TensorRT, TVM, etc.) without retraining. The tools also support quantization during conversion, enabling efficient deployment on resource-constrained devices.","intents":["Convert InternLM models to optimized formats for specific inference engines","Deploy models across heterogeneous hardware (GPUs, CPUs, NPUs, mobile devices)","Reduce model size and latency through quantization during conversion"],"best_for":["Teams deploying models across multiple hardware platforms","Developers optimizing for specific inference engines (ONNX, TensorRT, etc.)","Organizations requiring model portability across different deployment environments"],"limitations":["Conversion can introduce numerical precision loss; converted models may have 1-3% quality degradation","Not all InternLM features are supported in all target formats; some capabilities may be lost during conversion","Conversion process is complex and error-prone; requires careful validation of converted models","Converted models may not benefit from future InternLM optimizations without reconversion"],"requires":["InternLM model weights in HuggingFace or SafeTensors format","Conversion tools (included in InternLM repository or via LMDeploy)","Target framework installed (ONNX Runtime, TensorRT, TVM, etc.)","Python 3.8+ with PyTorch or other deep learning framework"],"input_types":["model weights (HuggingFace, SafeTensors format)"],"output_types":["model weights (ONNX, TensorRT, TVM, or other target format)"],"categories":["automation-workflow","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_2","uri":"capability://tool.use.integration.function.calling.and.tool.use.with.schema.based.dispatch","name":"function calling and tool use with schema-based dispatch","description":"InternLM2.5 and InternLM2 models support structured function calling through a schema-based approach where tools are defined as JSON schemas and the model learns to emit properly formatted tool calls within its generation. The implementation uses a special token vocabulary for tool invocation and integrates with frameworks like LMDeploy and SGLang that parse model outputs and route calls to registered functions. This enables agentic workflows where the model can autonomously decide when and how to use external tools (APIs, calculators, databases) based on user intent.","intents":["Build AI agents that can call APIs, execute code, or query databases autonomously","Create applications where LLMs interact with external tools without manual prompt engineering","Implement multi-step workflows where tool outputs feed into subsequent reasoning"],"best_for":["Teams building autonomous agents or AI assistants","Developers creating integrations between LLMs and existing APIs/services","Researchers prototyping agentic systems with tool-use capabilities"],"limitations":["Tool calling accuracy degrades with complex nested schemas (>10 parameters per tool); simpler schemas perform better","No built-in error recovery; if a tool call fails, the model doesn't automatically retry or adjust strategy","Requires explicit tool schema definition; no automatic schema inference from function signatures"],"requires":["InternLM2.5-7B-Chat, InternLM2.5-20B-Chat, or InternLM2-20B-Chat model","Inference framework with tool-calling support: LMDeploy 0.2.0+, SGLang, or custom implementation","Tool definitions as JSON schemas; function implementations in Python or other languages"],"input_types":["text (user queries with implicit or explicit tool requirements)","structured data (JSON tool schemas)"],"output_types":["text (natural language responses)","structured data (JSON tool calls with parameters)"],"categories":["tool-use-integration","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_3","uri":"capability://code.generation.editing.code.generation.and.understanding.with.syntax.aware.completion","name":"code generation and understanding with syntax-aware completion","description":"InternLM models are trained on large code corpora and support code generation, completion, and understanding tasks across 40+ programming languages. The models learn to generate syntactically correct code through exposure to high-quality open-source repositories during pretraining. Code understanding is enhanced through instruction-tuning on code-related tasks (debugging, explanation, optimization). The architecture uses standard transformer attention but benefits from code-specific tokenization that preserves syntax structure, enabling better handling of indentation and bracket matching.","intents":["Generate code snippets or complete functions from natural language descriptions","Analyze and explain existing code, identify bugs, or suggest optimizations","Support developers with IDE-like code completion and refactoring suggestions"],"best_for":["Developers building AI-assisted coding tools or IDE plugins","Teams automating code generation for boilerplate or repetitive patterns","Researchers studying code-LLM capabilities and limitations"],"limitations":["Code generation quality varies significantly by language; Python and JavaScript are strongest, niche languages (Rust, Go) are weaker","Models may generate syntactically correct but semantically incorrect code; always requires human review","No built-in ability to execute generated code or validate against test suites; requires external tooling"],"requires":["InternLM2.5-7B-Chat, InternLM2-20B-Chat, or base models fine-tuned for code tasks","Inference framework: Hugging Face Transformers, LMDeploy, or vLLM","Optional: IDE integration layer for real-time completion (e.g., VS Code extension)"],"input_types":["text (natural language descriptions, code snippets, questions about code)"],"output_types":["text (generated code, explanations, refactoring suggestions)"],"categories":["code-generation-editing","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_4","uri":"capability://memory.knowledge.long.context.processing.with.1m.token.support.internlm2.5","name":"long-context processing with 1m token support (internlm2.5)","description":"InternLM2.5 extends context handling to 1 million tokens through continued pretraining with specialized position interpolation techniques and efficient attention mechanisms. The implementation uses a combination of RoPE scaling, grouped-query attention (GQA) for memory efficiency, and training on synthetic long-context data to enable processing of entire books, codebases, or document collections in a single context window. This is achieved without catastrophic forgetting of the base 200K capability through careful curriculum learning during continued pretraining.","intents":["Process entire books, research papers, or codebases in a single query","Build document-aware systems that can reference any part of a large corpus without chunking","Create applications requiring cross-document reasoning over massive amounts of text"],"best_for":["Legal tech companies processing large contracts and precedent documents","Research teams analyzing entire codebases or scientific literature","Enterprise systems requiring whole-document understanding without retrieval"],"limitations":["1M token context requires 80GB+ VRAM for inference; practical deployment requires quantization (4-bit or 8-bit) reducing quality","Inference latency at 1M tokens is prohibitive for real-time applications (minutes per query); suitable only for batch processing","Attention complexity is O(n²); even with optimizations, memory usage scales quadratically with context length","Quality degrades in the middle of very long contexts ('lost in the middle' problem); model performs better on information at beginning/end"],"requires":["InternLM2.5-7B or InternLM2.5-20B model weights","Inference framework with long-context support: LMDeploy with long-context optimization, vLLM with paged attention, or SGLang","GPU with 80GB+ VRAM (A100/H100) or quantization support (4-bit or 8-bit) for practical deployment","Batch processing infrastructure for handling multi-minute inference times"],"input_types":["text (very long documents, concatenated files, or streaming input up to 1M tokens)"],"output_types":["text (responses with citations to specific positions in long context)"],"categories":["memory-knowledge","text-generation-language"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_5","uri":"capability://code.generation.editing.instruction.tuned.base.model.fine.tuning.with.xtuner","name":"instruction-tuned base model fine-tuning with xtuner","description":"InternLM provides XTuner, a specialized fine-tuning framework that enables efficient adaptation of base models to specific domains or tasks through low-rank adaptation (LoRA), quantized LoRA (QLoRA), and full fine-tuning. XTuner handles the complete fine-tuning pipeline including data loading, training loop management, gradient accumulation, and model checkpointing. The framework integrates with InternLM's training infrastructure and supports both single-GPU and distributed training, making it accessible for teams without massive compute budgets.","intents":["Adapt InternLM base models to domain-specific tasks (medical, legal, financial) with limited labeled data","Create specialized chat models for particular use cases without training from scratch","Fine-tune models on consumer GPUs using QLoRA for memory-efficient adaptation"],"best_for":["Teams with domain-specific data wanting to customize InternLM models","Researchers experimenting with fine-tuning approaches and hyperparameters","Developers building specialized AI applications with limited compute resources"],"limitations":["LoRA fine-tuning adds ~10-20% inference latency due to rank-r matrix multiplications; full fine-tuning eliminates this but requires more compute","QLoRA (4-bit quantization) introduces quantization noise that can degrade model quality by 2-5% on downstream tasks","Fine-tuning on small datasets (<1K examples) risks overfitting; requires careful validation and regularization","XTuner is optimized for InternLM models; compatibility with other model families is limited"],"requires":["XTuner framework (pip install xtuner or from source)","InternLM base model weights (InternLM2-7B, InternLM2-20B, etc.)","Training data in supported formats (JSON, CSV, or custom loaders)","GPU with 8GB+ VRAM for QLoRA, 24GB+ for full fine-tuning (single GPU); distributed training for larger models"],"input_types":["text (training examples as instruction-response pairs or conversational data)","structured data (JSON/CSV with fields for instruction, input, output)"],"output_types":["model weights (LoRA adapters or full fine-tuned model checkpoints)"],"categories":["code-generation-editing","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_6","uri":"capability://automation.workflow.inference.optimization.and.deployment.via.lmdeploy","name":"inference optimization and deployment via lmdeploy","description":"InternLM integrates with LMDeploy, a specialized inference toolkit that optimizes model serving through techniques including key-value (KV) cache quantization, continuous batching, and dynamic shape inference. LMDeploy compiles models to an optimized intermediate representation (IR) and uses a custom CUDA kernel library for efficient attention computation. The toolkit supports multiple deployment backends (local, Docker, Kubernetes) and provides REST/gRPC APIs for model serving, enabling production-grade inference with 2-3x throughput improvement over naive implementations.","intents":["Deploy InternLM models in production with high throughput and low latency","Serve multiple concurrent requests efficiently through continuous batching","Reduce inference costs through KV cache quantization and memory optimization"],"best_for":["Teams deploying InternLM models in production services","Developers building inference APIs or chatbot backends","Organizations optimizing inference costs for high-traffic applications"],"limitations":["LMDeploy optimization is specific to NVIDIA GPUs; AMD/Intel GPU support is limited","KV cache quantization (int8) introduces ~1-2% quality degradation on some tasks","Continuous batching adds complexity to request scheduling; debugging latency issues requires understanding of batching dynamics","Custom CUDA kernels may have compatibility issues with newer GPU architectures or CUDA versions"],"requires":["LMDeploy 0.2.0+ (pip install lmdeploy)","InternLM model weights in supported format (HuggingFace, SafeTensors)","NVIDIA GPU with compute capability 7.0+ (V100, A100, H100, etc.)","CUDA 11.8+ and cuDNN 8.0+"],"input_types":["text (prompts via REST API, gRPC, or Python SDK)"],"output_types":["text (streaming or batch responses via API)"],"categories":["automation-workflow","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_7","uri":"capability://planning.reasoning.agent.system.with.multi.tool.orchestration.and.planning","name":"agent system with multi-tool orchestration and planning","description":"InternLM provides an agent framework that enables models to autonomously plan and execute multi-step workflows using multiple tools. The agent system uses a planning-execution loop where the model generates action plans (decomposing user intent into tool calls), executes tools, observes results, and refines plans based on feedback. This is implemented through a specialized prompt template that guides the model through reasoning, tool selection, and result interpretation. The framework supports both sequential and parallel tool execution, with built-in error handling and retry logic.","intents":["Build autonomous agents that can solve complex tasks requiring multiple tool calls","Create systems where AI can plan multi-step workflows and adapt based on intermediate results","Develop applications combining reasoning, tool use, and knowledge retrieval"],"best_for":["Teams building autonomous AI assistants or task-completion systems","Developers creating complex workflow automation with AI decision-making","Researchers studying agent architectures and multi-step reasoning"],"limitations":["Agent planning quality degrades with task complexity; tasks requiring >5 sequential steps often fail due to context accumulation","No built-in mechanism for long-term memory or learning from past agent executions","Tool calling errors can cascade; if one tool fails, subsequent steps may be invalid without explicit error recovery","Agent framework is relatively new; production stability and edge case handling are still being refined"],"requires":["InternLM2.5 or InternLM2 chat model with tool-calling support","Agent framework (included in InternLM repository or via LMDeploy)","Tool definitions and implementations (Python functions or API endpoints)","Inference framework supporting function calling (LMDeploy, SGLang)"],"input_types":["text (high-level user goals or tasks)","structured data (tool definitions, available actions)"],"output_types":["text (final results, execution traces)","structured data (action sequences, intermediate results)"],"categories":["planning-reasoning","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_8","uri":"capability://planning.reasoning.reward.model.training.for.reinforcement.learning.from.human.feedback.rlhf","name":"reward model training for reinforcement learning from human feedback (rlhf)","description":"InternLM provides reward models (separate from base models) trained to score model outputs on quality dimensions (helpfulness, harmlessness, honesty). These reward models are used in RLHF pipelines to fine-tune base models based on human preference data. The reward models are trained via supervised learning on preference pairs (chosen vs rejected responses) and learn to assign scalar scores that correlate with human judgments. This enables iterative improvement of chat models through preference optimization algorithms like DPO (Direct Preference Optimization) or PPO (Proximal Policy Optimization).","intents":["Train reward models to evaluate and score model outputs for quality assessment","Use reward signals to fine-tune base models via RLHF or preference optimization","Measure and improve model alignment with human preferences on custom domains"],"best_for":["Teams implementing RLHF pipelines for model improvement","Researchers studying preference learning and alignment","Organizations wanting to customize model behavior based on domain-specific preferences"],"limitations":["Reward model training requires large preference datasets (10K+ pairs); small datasets lead to poor generalization","Reward models can exhibit reward hacking where models learn to game the reward signal rather than improve actual quality","Training is computationally expensive; requires multiple GPUs and careful hyperparameter tuning","Reward models are domain-specific; a reward model trained on customer support data won't transfer to technical writing"],"requires":["InternLM base model (7B or 20B)","Preference dataset with chosen/rejected response pairs (JSON format)","Training framework: XTuner or Hugging Face Trainer with custom reward modeling code","GPU with 24GB+ VRAM for training; distributed training recommended for large datasets"],"input_types":["text (prompts and model responses for preference pairs)","structured data (JSON with prompt, chosen_response, rejected_response fields)"],"output_types":["model weights (trained reward model)","structured data (reward scores for evaluation)"],"categories":["planning-reasoning","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__cap_9","uri":"capability://text.generation.language.web.demo.and.interactive.interface.for.model.exploration","name":"web demo and interactive interface for model exploration","description":"InternLM provides a web-based demo interface built with Gradio or Streamlit that enables interactive exploration of model capabilities without coding. The interface supports real-time chat, parameter adjustment (temperature, top-p, max tokens), and visualization of model behavior. The demo can be deployed locally or on cloud platforms, making it accessible for non-technical users to test model outputs. The interface integrates with LMDeploy for efficient inference, enabling responsive interactions even with large models.","intents":["Evaluate model capabilities interactively before integration into applications","Share model demos with stakeholders or users for feedback and testing","Prototype conversational interfaces and gather user feedback on model behavior"],"best_for":["Product teams evaluating models for deployment","Researchers demonstrating model capabilities to collaborators","Non-technical stakeholders wanting to interact with models directly"],"limitations":["Web demo is single-user or limited concurrent users; not suitable for production serving","No built-in authentication or access control; requires additional security layers for sensitive deployments","Parameter tuning in UI doesn't persist across sessions; no history or logging of interactions"],"requires":["InternLM model weights and inference framework (LMDeploy, Transformers, or vLLM)","Gradio or Streamlit (pip install gradio or streamlit)","Python 3.8+","GPU or CPU for inference (GPU strongly recommended for reasonable latency)"],"input_types":["text (user prompts via web interface)"],"output_types":["text (model responses rendered in web UI)"],"categories":["text-generation-language","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"internlm__headline","uri":"capability://model.training.multilingual.foundation.model.for.reasoning.and.code.generation","name":"multilingual foundation model for reasoning and code generation","description":"InternLM is a powerful multilingual foundation model series designed for advanced reasoning, mathematical problem-solving, and code generation, offering extensive tool-use capabilities and support for long context windows.","intents":["best multilingual AI model for reasoning","foundation model for code generation","top LLM for math and reasoning tasks","AI model with extensive tool-use capabilities","best model for long-context processing"],"best_for":["developers needing multilingual support","researchers in AI and machine learning"],"limitations":[],"requires":[],"input_types":[],"output_types":[],"categories":["model-training"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":57,"verified":false,"data_access_risk":"high","permissions":["InternLM2.5-7B-Chat or InternLM2-20B-Chat model weights (7B or 20B parameter versions)","Inference framework: Hugging Face Transformers 4.30+, LMDeploy, vLLM, or Ollama","GPU with 16GB+ VRAM for 7B model, 40GB+ for 20B model (or quantization support)","InternLM3-8B-Instruct model weights","Inference framework supporting extended generation (LMDeploy with deep-thinking support, or custom implementation)","GPU with 16GB+ VRAM for inference","InternLM vision-language model variant (e.g., InternLM-XComposer or similar)","Vision encoder weights (typically CLIP or similar)","Inference framework supporting multi-modal input (LMDeploy, Transformers with vision support)","NPU hardware (Huawei Ascend or compatible device)"],"failure_modes":["200K context window is supported but inference latency scales linearly with context length; practical throughput degrades significantly beyond 100K tokens","Chat models are instruction-tuned but may hallucinate on factual queries without retrieval augmentation","Multilingual performance varies by language; English and Chinese are primary, other languages have reduced quality","Deep thinking mode increases latency by 3-5x compared to normal mode; not suitable for real-time applications","Reasoning quality depends on problem complexity; very simple queries may not benefit from extended thinking","Only available in InternLM3-8B-Instruct; not available in smaller 1.8B or larger 20B variants","Vision-language variants are still in development; fewer model sizes and variants available compared to text-only models","Image resolution is limited by token budget; high-resolution images must be downsampled or tiled","Vision encoder adds computational overhead; inference is slower than text-only models","Multi-modal instruction-tuning data is limited; performance on specialized visual tasks may be weaker than specialized vision models","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.7,"quality":0.9,"ecosystem":0.39999999999999997,"match_graph":0.25,"freshness":0.52,"weights":{"adoption":0.35,"quality":0.2,"ecosystem":0.1,"match_graph":0.3,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-06-17T09:51:04.692Z","last_scraped_at":null,"last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=internlm","compare_url":"https://unfragile.ai/compare?artifact=internlm"}},"signature":"PA5EfikX2N/4XLNrRo6Lfbl/XtO6BSZw3wzJjwISPVzvuwaImv7wr/uwIsGp/fK59JE2yEF+rEWWmaqJSjXSDA==","signedAt":"2026-06-20T03:57:39.688Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/internlm","artifact":"https://unfragile.ai/internlm","verify":"https://unfragile.ai/api/v1/verify?slug=internlm","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}