Capability
Inference Optimization Via Mixed Precision Computation
4 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “efficient inference via model quantization and mixed-precision execution”
image-to-text model by undefined. 14,17,263 downloads.
Unique: Integrates with bitsandbytes for seamless int8 quantization without manual calibration; supports both PyTorch and TensorFlow backends. Quantization is applied transparently via the transformers API without modifying model code.
vs others: Easier to use than manual quantization with ONNX or TensorRT; automatic calibration eliminates the need for representative datasets.