ONNX RuntimeFramework44/100
via “inference session management with session configuration and state isolation”
Cross-platform ML inference accelerator — runs ONNX models on any hardware with optimizations.
Unique: Implements session state as a first-class object (InferenceSession class) that owns memory allocators, execution contexts, and provider instances. Sessions support configurable execution provider chains (SessionOptions.execution_providers) allowing runtime selection and fallback without recompilation. The async execution model (RunAsync) uses a callback-based pattern rather than futures, enabling integration with event-driven systems.
vs others: More granular session configuration than TensorFlow Serving (per-session optimization levels, memory strategies) and better isolation than PyTorch's global state model, enabling safer multi-model serving.