{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"onnx-runtime-mobile","slug":"onnx-runtime-mobile","name":"ONNX Runtime Mobile","type":"framework","url":"https://onnxruntime.ai/docs/tutorials/mobile","page_url":"https://unfragile.ai/onnx-runtime-mobile","categories":["deployment-infra"],"tags":[],"pricing":{"model":"free","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"onnx-runtime-mobile__cap_0","uri":"capability://automation.workflow.arm.optimized.onnx.model.inference.on.mobile.devices","name":"arm-optimized onnx model inference on mobile devices","description":"Executes pre-trained ONNX models directly on ARM-based mobile processors (iOS/Android) with native ARM SIMD optimizations and memory-efficient execution patterns. The runtime loads the serialized ONNX model into device memory, parses the computation graph, and executes operations sequentially on the ARM CPU with minimal overhead, supporting both 32-bit and quantized 8-bit weight formats for reduced memory footprint.","intents":["Deploy a trained PyTorch or TensorFlow model to iOS/Android without cloud dependencies","Run inference on-device while keeping user data private and avoiding network latency","Execute computer vision or NLP models on resource-constrained mobile hardware"],"best_for":["Mobile app developers building privacy-first AI features","Teams deploying edge AI without cloud infrastructure","Developers targeting iOS and Android simultaneously with shared model logic"],"limitations":["Model must fit entirely in device RAM and storage — no streaming or chunked loading","ARM CPU inference is slower than GPU acceleration; typical latency depends on model size and device generation","No automatic operator optimization — unsupported ONNX operators cause graph fragmentation and fallback to CPU","Cold start latency for model loading and graph initialization not documented but likely 100-500ms depending on model size"],"requires":["ONNX model file (converted from PyTorch, TensorFlow, TFLite, or scikit-learn)","Android API 21+ (for Android) or iOS 11.0+ (for iOS)","Model size must be <device available storage; typical constraint is 50-500MB for practical mobile apps"],"input_types":["ONNX model file (.onnx)","Tensor data (float32, int32, int64, uint8 depending on model quantization)"],"output_types":["Tensor output (float32 or quantized int8)","Structured predictions (class labels, bounding boxes, embeddings)"],"categories":["automation-workflow","edge-inference"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_1","uri":"capability://automation.workflow.hardware.accelerator.delegation.via.execution.providers","name":"hardware accelerator delegation via execution providers","description":"Routes inference operations to specialized hardware accelerators (CoreML on iOS, NNAPI on Android, XNNPACK on both) through a pluggable execution provider architecture. The runtime inspects the model graph at load time, identifies operators supported by the target accelerator, and delegates compatible subgraphs to the accelerator while keeping unsupported operations on CPU. Configuration happens via SessionOptions before model loading, allowing per-session tuning without code changes.","intents":["Accelerate inference 2-10x by offloading to iOS CoreML Neural Engine or Android NNAPI hardware","Automatically fall back to CPU if accelerator is unavailable or unsupported on a device","Benchmark different execution providers (CPU vs CoreML vs NNAPI) to find optimal performance for a specific model"],"best_for":["Developers optimizing latency-sensitive features (real-time video processing, live translation)","Teams supporting diverse Android devices with varying NNAPI versions and capabilities","iOS developers targeting iPhone 11+ with Neural Engine hardware"],"limitations":["Accelerator support is device and model specific — no guarantee of speedup; some models may be slower on accelerators due to data transfer overhead","NNAPI performance degrades on older Android versions (API <28) due to limited operator coverage","CoreML conversion may lose precision or unsupported operators, requiring manual model adjustment","Execution provider initialization adds 50-200ms overhead at session creation time","No built-in profiling to identify which operators are actually accelerated vs CPU fallback"],"requires":["iOS 11.0+ (CoreML) or Android API 27+ (NNAPI) for hardware acceleration","Model operators must be compatible with target accelerator (CoreML supports ~100 ops, NNAPI ~80 ops)","SessionOptions API available in language binding (Java/C++ for Android, C/Objective-C for iOS)"],"input_types":["ONNX model file","SessionOptions configuration object specifying execution provider priority"],"output_types":["Inference results (same tensor format regardless of execution provider)","Execution provider selection metadata (for debugging which provider was used)"],"categories":["automation-workflow","performance-optimization"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_10","uri":"capability://automation.workflow.batch.inference.and.multi.model.orchestration","name":"batch inference and multi-model orchestration","description":"Enables processing multiple inference requests in a single batch to improve throughput and hardware utilization, and supports loading and executing multiple models sequentially or in parallel within a single application. Batch inference is implemented by stacking inputs into a single tensor with batch dimension and running inference once, reducing per-request overhead. Multi-model orchestration is managed by the application — ONNX Runtime provides session management APIs to load and execute multiple models independently.","intents":["Process multiple images in a single inference call to improve throughput by 2-5x","Run multiple models in sequence (e.g., object detection → classification) without reloading models between steps","Implement ensemble inference by running multiple models on the same input and combining results"],"best_for":["Batch processing applications (e.g., processing a batch of images from a photo library)","Multi-stage inference pipelines (e.g., detection → classification → tracking)","Ensemble models combining multiple architectures for improved accuracy"],"limitations":["Batch inference requires variable batch dimension in the model — not all models support this","Batch size is limited by available device memory — larger batches may cause out-of-memory errors","Multi-model orchestration is manual — no built-in pipeline or DAG execution framework","No automatic load balancing or scheduling across multiple models — developers must manage execution order","Batch inference latency is not linear — overhead per request decreases with batch size, but absolute latency increases"],"requires":["ONNX model with variable batch dimension (marked as -1 in input shape)","Sufficient device memory for batch size × model size"],"input_types":["Batched tensor inputs (multiple samples stacked along batch dimension)"],"output_types":["Batched tensor outputs (results for all samples in batch)"],"categories":["automation-workflow","throughput-optimization"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_11","uri":"capability://safety.moderation.security.validation.and.malicious.model.detection","name":"security validation and malicious model detection","description":"Provides guidance and best practices for validating ONNX models before deployment to detect potential security threats (e.g., models designed to consume excessive memory or compute). The runtime does not include built-in malicious model detection, but documentation recommends inspecting model structure, operator counts, and tensor sizes before production deployment. This is a responsibility shared between the runtime and the application developer.","intents":["Validate that a model from an untrusted source won't cause denial-of-service by consuming excessive resources","Inspect model structure to understand what operations it performs before deploying to production","Implement security checks in the application to reject models that exceed resource budgets"],"best_for":["Applications accepting user-provided models (e.g., model marketplace, federated learning)","Security-conscious teams deploying models from external sources","Developers building model validation pipelines"],"limitations":["No built-in malicious model detection — validation is manual and requires developer expertise","ONNX format does not include cryptographic signatures or integrity checks — models can be modified without detection","No sandboxing or resource limits — a malicious model can consume all available memory or CPU","Validation must be done before model loading — no runtime protection against resource exhaustion","No standardized model security format or certification — each application must implement its own validation"],"requires":["Manual model inspection (e.g., using ONNX visualization tools)","Application-level validation logic (e.g., checking operator counts, tensor sizes)"],"input_types":["ONNX model file"],"output_types":["Validation result (pass/fail)","Model metadata (operator counts, tensor sizes, memory requirements)"],"categories":["safety-moderation","security"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_12","uri":"capability://safety.moderation.error.handling.and.model.validation","name":"error handling and model validation","description":"Validates ONNX model format, operator compatibility, and tensor shapes at session creation and inference time. The runtime returns error codes and messages for invalid models, unsupported operators, and shape mismatches. Error handling is language-specific (exceptions in Java/C#, error codes in C++).","intents":["I want to validate that my ONNX model is compatible with the target device before deploying","I need clear error messages when model loading fails or inference produces unexpected results","I want to handle errors gracefully and provide fallback behavior if inference fails"],"best_for":["Developers debugging model compatibility issues","Teams building robust inference pipelines with error recovery","QA engineers validating models before production deployment"],"limitations":["Model validation is performed at session creation time; invalid models are not detected until runtime","Error messages are generic and do not provide actionable debugging information (e.g., 'unsupported operator' without specifying which operator)","No automatic error recovery; developers must implement fallback logic manually","Shape validation is performed at inference time, not at model load time; shape errors cause inference failures","Error handling is language-specific; error codes and exceptions vary by SDK"],"requires":["ONNX model file","Error handling code in application (try-catch for exceptions, error code checks for C++)"],"input_types":["ONNX model file","Input tensors for inference"],"output_types":["Error codes or exceptions","Error messages describing validation failures"],"categories":["safety-moderation","error-handling"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_2","uri":"capability://data.processing.analysis.model.quantization.and.size.optimization","name":"model quantization and size optimization","description":"Reduces model size by 75-80% through 8-bit integer quantization (converting 32-bit float weights to 8-bit integers) while maintaining inference accuracy within 1-2% of the original model. The quantization process is applied post-training via external tools (referenced in documentation but not built-in), and the runtime natively executes quantized models with optimized integer arithmetic kernels. Quantized models consume less device storage and RAM, enabling deployment of larger models on memory-constrained devices.","intents":["Reduce a 100MB model to 25MB to fit within app size constraints or device storage limits","Lower memory footprint to enable inference on budget Android devices with <2GB RAM","Decrease power consumption by using integer arithmetic instead of floating-point operations"],"best_for":["Mobile developers targeting low-end Android devices (Snapdragon 400 series, MediaTek Helio)","Teams with strict app size budgets (e.g., <50MB total app size)","Developers deploying multiple models on a single device"],"limitations":["Quantization is post-training only — no built-in quantization-aware training (QAT) in ONNX Runtime Mobile","Accuracy loss of 1-5% is typical; some models (especially transformers) may degrade more significantly","Quantization tools are external (e.g., ONNX quantization scripts, TensorFlow Lite quantizer) — not integrated into ONNX Runtime","Quantized models may not be compatible with all execution providers (e.g., some CoreML versions have limited int8 support)","No dynamic quantization at runtime — quantization is static and applied before deployment"],"requires":["Original model in ONNX format","Quantization tool (external, e.g., ONNX Model Quantization script or TensorFlow Lite converter)","Calibration dataset to determine quantization parameters (representative input samples)"],"input_types":["ONNX model file (float32)","Calibration dataset (representative inputs for quantization parameter calculation)"],"output_types":["Quantized ONNX model file (int8 weights)","Quantization metadata (scale/zero-point parameters)"],"categories":["data-processing-analysis","model-optimization"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_3","uri":"capability://tool.use.integration.cross.platform.model.deployment.with.language.bindings","name":"cross-platform model deployment with language bindings","description":"Provides unified ONNX model inference API across iOS (C/C++, Objective-C), Android (Java, C/C++), and .NET (C#/MAUI) through language-specific bindings that wrap the native C++ runtime. Each binding exposes a consistent SessionOptions-based API: create session, configure execution provider, load model, run inference. The bindings handle memory management, tensor marshalling, and error propagation, abstracting platform differences while maintaining performance.","intents":["Deploy the same ONNX model to iOS and Android with minimal code duplication","Use C# and MAUI to build cross-platform mobile apps with shared inference logic","Integrate ONNX inference into existing native iOS/Android codebases without rewriting in a different language"],"best_for":["Teams building iOS and Android apps simultaneously (e.g., using React Native, Flutter, or MAUI)","Native iOS developers (Objective-C/Swift) and Android developers (Java/Kotlin) working on the same product","C# developers using MAUI or Xamarin for cross-platform mobile development"],"limitations":["Java binding for Android is not feature-complete — some advanced SessionOptions (e.g., custom operators) are only available in C++","C# binding requires .NET 6+ or Xamarin, limiting compatibility with older projects","Objective-C binding is less actively maintained than C++ — some new features may lag","No Python binding for mobile (Python is not practical on mobile devices)","Language bindings add ~5-10% overhead compared to direct C++ usage due to marshalling"],"requires":["Android: Java 8+ or C++ (NDK 21+)","iOS: Xcode 12+ with C/C++ or Objective-C support",".NET: .NET 6+ or Xamarin.iOS/Xamarin.Android","ONNX Runtime package for target platform (e.g., onnxruntime-android, onnxruntime-objc)"],"input_types":["ONNX model file","Tensor data in language-native format (Java arrays, C++ std::vector, C# arrays)"],"output_types":["Inference results in language-native format","Session metadata (model input/output shapes, data types)"],"categories":["tool-use-integration","cross-platform-development"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_4","uri":"capability://tool.use.integration.custom.operator.registration.and.extension","name":"custom operator registration and extension","description":"Allows developers to register custom C/C++ operators that extend the ONNX operator set, enabling inference of models with proprietary or experimental operations not in the standard ONNX specification. Custom operators are registered via the SessionOptions API before model loading, and the runtime dispatches matching operations in the model graph to the custom implementation. This enables deployment of cutting-edge models (e.g., with novel activation functions or attention mechanisms) without waiting for ONNX standardization.","intents":["Deploy a model with custom operators (e.g., proprietary activation functions, research-stage attention mechanisms) to mobile","Integrate domain-specific operations (e.g., signal processing, cryptography) into inference pipelines","Optimize performance by implementing custom operators with platform-specific SIMD or hardware acceleration"],"best_for":["Research teams deploying novel model architectures with non-standard operators","Teams with proprietary model architectures requiring custom operations","Performance-critical applications where custom operators enable 2-5x speedup over generic implementations"],"limitations":["Custom operators must be implemented in C/C++ — no Python or higher-level language support","Custom operators are not portable across platforms — separate implementations needed for iOS, Android, and .NET","No built-in testing or validation framework for custom operators — developers must verify correctness and performance","Custom operators bypass ONNX standardization, creating vendor lock-in and compatibility issues","Java binding does not support custom operator registration — only C++ binding supports this feature","Debugging custom operators is difficult — limited visibility into operator execution and error messages"],"requires":["C/C++ development environment (Xcode for iOS, Android NDK for Android)","Understanding of ONNX operator interface and tensor memory layout","Custom operator implementation matching ONNX kernel signature (inputs, outputs, attributes)"],"input_types":["ONNX model file with custom operator nodes","Custom operator C/C++ implementation"],"output_types":["Registered custom operator available for inference","Inference results including custom operator outputs"],"categories":["tool-use-integration","extensibility"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_5","uri":"capability://data.processing.analysis.model.graph.optimization.and.operator.fusion","name":"model graph optimization and operator fusion","description":"Automatically optimizes the ONNX computation graph at load time by fusing adjacent operators into single kernels (e.g., Conv+BatchNorm+ReLU → single fused kernel), eliminating intermediate tensor allocations and memory bandwidth overhead. The optimizer also performs constant folding, dead code elimination, and layout optimization to reduce memory usage and latency. Optimization is transparent and happens before execution provider selection, improving performance across all backends.","intents":["Reduce model latency by 10-30% through operator fusion without changing the model or application code","Lower memory usage by eliminating intermediate tensors created between fused operators","Improve cache locality and reduce memory bandwidth pressure on ARM processors"],"best_for":["Developers deploying latency-sensitive models (real-time video, live translation, gesture recognition)","Teams with strict power budgets (optimization reduces memory bandwidth, lowering power consumption)","Developers targeting older/slower ARM processors where optimization impact is most significant"],"limitations":["Optimization is automatic but not configurable — developers cannot selectively disable or tune specific optimizations","Some optimizations may reduce numerical precision slightly (e.g., fusing operations can accumulate rounding errors)","Optimization adds 50-200ms overhead at model load time (one-time cost)","Optimized graphs are not portable — optimization is specific to the target execution provider and device","No visibility into which operators were fused — difficult to debug if optimization causes accuracy issues"],"requires":["ONNX model file","No explicit configuration required — optimization is automatic"],"input_types":["ONNX model file"],"output_types":["Optimized computation graph (internal representation)","Inference results (identical to non-optimized model)"],"categories":["data-processing-analysis","performance-optimization"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_6","uri":"capability://data.processing.analysis.multi.input.output.model.inference.with.dynamic.shapes","name":"multi-input/output model inference with dynamic shapes","description":"Executes ONNX models with multiple inputs and outputs, supporting dynamic tensor shapes (e.g., variable batch size, variable sequence length) that are determined at runtime rather than fixed at model export time. The runtime infers output shapes based on input shapes and model graph structure, allocating tensors dynamically without requiring pre-allocation. This enables flexible inference patterns such as processing variable-length sequences or batching multiple inputs of different sizes.","intents":["Run inference on variable-length sequences (e.g., audio of different durations, text of different lengths) without padding","Process multiple inputs of different sizes in a single inference call","Implement dynamic batching where batch size is determined at runtime based on available data"],"best_for":["NLP models with variable sequence length (e.g., transformers, RNNs)","Audio/speech models processing variable-duration inputs","Computer vision models with variable input resolution"],"limitations":["Dynamic shapes add overhead for shape inference and tensor allocation — typically 5-10% latency increase","Some execution providers (e.g., CoreML) have limited dynamic shape support — may require fixed shapes","Memory allocation is dynamic, which can cause fragmentation on long-running inference sessions","No built-in batching optimization — developers must manually batch inputs if desired","Output shape must be deterministic based on input shapes — models with data-dependent output shapes are not supported"],"requires":["ONNX model with dynamic shape dimensions (marked with -1 or symbolic names)","Runtime must support dynamic shapes (all platforms support this, but some execution providers may not)"],"input_types":["Multiple tensors with variable shapes","Tensor data in any supported format (float32, int32, int64, uint8, etc.)"],"output_types":["Multiple output tensors with shapes inferred from inputs","Output shape metadata"],"categories":["data-processing-analysis","inference-flexibility"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_7","uri":"capability://automation.workflow.model.loading.and.session.management.with.memory.efficiency","name":"model loading and session management with memory efficiency","description":"Loads ONNX models from disk into device memory and creates inference sessions with configurable memory allocation strategies. The runtime supports memory mapping for large models (loading only required pages into RAM rather than the entire model), memory pooling to reduce allocation overhead, and session reuse to amortize model loading costs across multiple inferences. SessionOptions API allows fine-grained control over memory behavior, enabling developers to optimize for latency or memory usage depending on device constraints.","intents":["Load a large model (100-500MB) on a device with limited RAM without running out of memory","Reuse a single inference session across multiple inference calls to avoid repeated model loading overhead","Optimize memory allocation by pre-allocating pools or using memory mapping for large models"],"best_for":["Developers deploying large models on budget Android devices with <2GB RAM","Applications with strict latency requirements where model loading overhead must be minimized","Long-running inference services (e.g., background processing) where session reuse is critical"],"limitations":["Memory mapping is not supported on all platforms — Android supports it, iOS support is limited","Model loading latency is 100-500ms depending on model size and device I/O speed — cannot be eliminated entirely","Memory pooling adds complexity and may not be beneficial for single-inference applications","Session creation is not thread-safe — developers must synchronize access or create per-thread sessions","No built-in session caching or lifecycle management — developers must manage session lifetime manually"],"requires":["ONNX model file on device storage (internal or external)","Sufficient RAM for model weights plus activation tensors (typically 1.5-2x model size)"],"input_types":["ONNX model file path","SessionOptions configuration (memory allocation strategy, execution provider)"],"output_types":["Inference session object","Model metadata (input/output shapes, data types)"],"categories":["automation-workflow","resource-management"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_8","uri":"capability://automation.workflow.performance.profiling.and.latency.measurement","name":"performance profiling and latency measurement","description":"Provides built-in profiling capabilities to measure inference latency, operator execution time, and memory usage at runtime. The profiler instruments the inference graph and collects per-operator timing data, enabling developers to identify performance bottlenecks and optimize hot paths. Profiling data is exported in standard formats (JSON, CSV) for analysis and visualization, helping developers understand where time and memory are spent during inference.","intents":["Measure end-to-end inference latency to verify model meets performance requirements","Identify which operators consume the most time and memory to guide optimization efforts","Compare performance across different execution providers (CPU vs CoreML vs NNAPI) to select the best option"],"best_for":["Developers optimizing latency-sensitive models (real-time video, live translation)","Teams benchmarking different execution providers to select the best for their hardware","Performance engineers analyzing model bottlenecks and guiding quantization/pruning efforts"],"limitations":["Profiling adds 5-15% overhead to inference latency — profiling data is not representative of production performance","Profiling is per-session — cannot profile across multiple sessions or long-running applications","No built-in visualization tools — developers must parse JSON/CSV output and use external tools","Profiling granularity is per-operator — cannot profile within operators or at the instruction level","Memory profiling is approximate — actual memory usage may vary due to allocation patterns and garbage collection"],"requires":["SessionOptions API with profiling enabled","External tools for parsing and visualizing profiling output (e.g., Python scripts, Excel)"],"input_types":["ONNX model file","Inference inputs"],"output_types":["Profiling data (JSON or CSV format)","Per-operator timing and memory usage statistics"],"categories":["automation-workflow","performance-monitoring"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__cap_9","uri":"capability://data.processing.analysis.model.conversion.and.import.from.multiple.frameworks","name":"model conversion and import from multiple frameworks","description":"Supports importing pre-trained models from PyTorch, TensorFlow, TFLite, and scikit-learn by converting them to ONNX format using external conversion tools (ONNX converters, TensorFlow ONNX exporter, PyTorch ONNX exporter). The conversion process is framework-specific and happens outside ONNX Runtime, but ONNX Runtime provides tutorials and guidance for each framework. Once converted to ONNX, models are portable across all ONNX Runtime platforms (mobile, server, cloud).","intents":["Convert a PyTorch model trained on desktop to ONNX format for mobile deployment","Export a TensorFlow model to ONNX to enable cross-platform inference (iOS, Android, server)","Convert a scikit-learn model to ONNX for deployment in a mobile app"],"best_for":["ML teams with existing PyTorch or TensorFlow pipelines who want to deploy to mobile","Developers migrating from TensorFlow Lite to ONNX Runtime for better cross-platform support","Data scientists using scikit-learn who need to deploy models to mobile"],"limitations":["Conversion is not built into ONNX Runtime — requires external tools (PyTorch ONNX exporter, TensorFlow ONNX converter, etc.)","Conversion quality varies by framework — some operators may not convert cleanly, requiring manual model adjustment","Conversion is one-way — no built-in tools to convert ONNX back to original framework format","Conversion may lose model metadata (e.g., training hyperparameters, data preprocessing) — developers must track this separately","Some framework-specific features (e.g., PyTorch hooks, TensorFlow custom layers) may not convert to ONNX"],"requires":["Original model in PyTorch, TensorFlow, TFLite, or scikit-learn format","Framework-specific conversion tool (e.g., torch.onnx.export for PyTorch, tf2onnx for TensorFlow)","Understanding of ONNX operator set to debug conversion issues"],"input_types":["Model file in framework format (e.g., .pt for PyTorch, .pb for TensorFlow, .pkl for scikit-learn)"],"output_types":["ONNX model file (.onnx)"],"categories":["data-processing-analysis","model-conversion"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"onnx-runtime-mobile__headline","uri":"capability://deployment.infra.onnx.model.inference.engine.for.mobile.and.edge.devices","name":"onnx model inference engine for mobile and edge devices","description":"A cross-platform inference engine optimized for deploying ONNX models on mobile and edge devices, enabling efficient on-device AI across iOS and Android with support for ARM processors and custom operators.","intents":["best mobile AI inference engine","ONNX model deployment for mobile","cross-platform AI framework for edge devices","efficient on-device AI solutions","AI inference on iOS and Android"],"best_for":["mobile applications","edge computing","AI model inference"],"limitations":["not for model training"],"requires":["ONNX models"],"input_types":["ONNX model files"],"output_types":["inference results"],"categories":["deployment-infra"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":60,"verified":false,"data_access_risk":"high","permissions":["ONNX model file (converted from PyTorch, TensorFlow, TFLite, or scikit-learn)","Android API 21+ (for Android) or iOS 11.0+ (for iOS)","Model size must be <device available storage; typical constraint is 50-500MB for practical mobile apps","iOS 11.0+ (CoreML) or Android API 27+ (NNAPI) for hardware acceleration","Model operators must be compatible with target accelerator (CoreML supports ~100 ops, NNAPI ~80 ops)","SessionOptions API available in language binding (Java/C++ for Android, C/Objective-C for iOS)","ONNX model with variable batch dimension (marked as -1 in input shape)","Sufficient device memory for batch size × model size","Manual model inspection (e.g., using ONNX visualization tools)","Application-level validation logic (e.g., checking operator counts, tensor sizes)"],"failure_modes":["Model must fit entirely in device RAM and storage — no streaming or chunked loading","ARM CPU inference is slower than GPU acceleration; typical latency depends on model size and device generation","No automatic operator optimization — unsupported ONNX operators cause graph fragmentation and fallback to CPU","Cold start latency for model loading and graph initialization not documented but likely 100-500ms depending on model size","Accelerator support is device and model specific — no guarantee of speedup; some models may be slower on accelerators due to data transfer overhead","NNAPI performance degrades on older Android versions (API <28) due to limited operator coverage","CoreML conversion may lose precision or unsupported operators, requiring manual model adjustment","Execution provider initialization adds 50-200ms overhead at session creation time","No built-in profiling to identify which operators are actually accelerated vs CPU fallback","Batch inference requires variable batch dimension in the model — not all models support this","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.7,"quality":0.9,"ecosystem":0.3,"match_graph":0.25,"freshness":0.9,"weights":{"adoption":0.3,"quality":0.2,"ecosystem":0.15,"match_graph":0.23,"freshness":0.12}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-05-24T12:16:24.483Z","last_scraped_at":null,"last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=onnx-runtime-mobile","compare_url":"https://unfragile.ai/compare?artifact=onnx-runtime-mobile"}},"signature":"99sKBNBAj1AfIPttlysDTQhGbgRcLGe7xMf3XYd5oZImteCvAW6agaGxWCRlkvtn2jBrG6L4m5udwsC4SGm+CA==","signedAt":"2026-06-15T06:51:16.214Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/onnx-runtime-mobile","artifact":"https://unfragile.ai/onnx-runtime-mobile","verify":"https://unfragile.ai/api/v1/verify?slug=onnx-runtime-mobile","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}