What can en_PP-OCRv5_mobile_rec do?

mobile-optimized textline recognition from image crops, variable-length sequence decoding with attention, resnet-based feature extraction for textline images, quantized inference for mobile deployment, batch image preprocessing and normalization, character-level confidence scoring and filtering, integration with paddleocr detection pipeline

en_PP-OCRv5_mobile_rec

ModelFree

image-to-text model by undefined. 3,07,131 downloads.

Open Source

/ 100

7 capabilities

Capabilities7 decomposed

mobile-optimized textline recognition from image crops

Medium confidence

Recognizes text within pre-cropped textline image regions using a lightweight CNN-RNN architecture optimized for mobile deployment. The model processes variable-length textline images through a ResNet backbone for feature extraction, followed by a bidirectional LSTM sequence decoder that outputs character-level predictions. Architecture uses attention mechanisms to handle variable text lengths and orientations, with quantization and pruning applied to reduce model size from ~200MB to ~8-10MB for on-device inference.

Solves for

I need to recognize English text from individual text regions extracted by a text detection modelI want to run OCR inference on mobile/edge devices with minimal latency and memory footprintI need to process variable-length textlines with consistent accuracy across different fonts and scalesI want to integrate a pre-trained recognition model without training from scratch

Best for

mobile app developers building on-device OCR pipelines

edge computing teams deploying document processing on IoT devices

teams using PaddleOCR's detection+recognition two-stage pipeline

Requires

PaddlePaddle 2.4+ runtime or ONNX Runtime 1.14+ for inference

Pre-processed textline image crops (typically 32×320 pixels or similar aspect ratio)

For mobile: PaddleLite inference framework (Android/iOS SDKs available)

Limitations

Requires pre-cropped textline images — does not perform text detection itself; must be paired with a detection model

Optimized for English text only; multilingual support requires separate language-specific models

Performance degrades on rotated text >45 degrees or severely skewed/curved text without preprocessing

What makes it unique

Uses PaddleOCR's proprietary lightweight architecture combining ResNet feature extraction with bidirectional LSTM decoding, specifically tuned for mobile inference via PaddleLite quantization (INT8/FP16). Unlike generic CRNN models, incorporates attention mechanisms for variable-length handling and applies knowledge distillation to reduce parameters by ~60% while maintaining accuracy parity with full models.

vs alternatives

Smaller model footprint (~8-10MB) than Tesseract or EasyOCR with faster mobile inference, and better accuracy on modern fonts than traditional Tesseract; trades off language diversity for English-specific optimization and requires detection model pairing.

variable-length sequence decoding with attention

Medium confidence

Decodes variable-length character sequences from textline feature maps using a bidirectional LSTM with attention mechanism. The decoder attends over spatial feature dimensions to predict characters sequentially, handling text of different lengths (typically 1-50 characters) without fixed-size constraints. Attention weights allow the model to focus on relevant image regions for each predicted character, improving accuracy on compressed or distorted text.

Solves for

I need to recognize text of varying lengths without padding or resizing to fixed dimensionsI want to understand which image regions the model uses for each character prediction (interpretability)I need robust handling of short IDs, long product codes, and variable-length document fields

Best for

document processing pipelines with mixed-length text fields

applications requiring interpretability of character-level predictions

teams building custom OCR systems with variable input constraints

Requires

PaddlePaddle 2.4+ with LSTM and attention operator support

Input feature maps from CNN backbone (typically 512-1024 channels, variable spatial dimensions)

Limitations

Attention computation adds ~15-20% latency overhead vs non-attentional LSTM baselines

Attention weights are not guaranteed to be human-interpretable; may not align with actual character boundaries

Performance degrades on sequences >50 characters due to attention mechanism saturation

What makes it unique

Implements 2D spatial attention over feature maps rather than 1D sequence attention, allowing the model to attend to specific image regions for each character. This differs from standard seq2seq attention by preserving spatial locality, critical for OCR where character position in the image directly correlates with output position.

vs alternatives

More accurate than fixed-length CTC decoders on variable-length text, and more interpretable than pure RNN baselines; trades computational cost for robustness on diverse text lengths.

resnet-based feature extraction for textline images

Medium confidence

Extracts spatial feature representations from textline images using a lightweight ResNet backbone (typically ResNet18 or ResNet34 variant) with depthwise separable convolutions for mobile efficiency. The backbone progressively downsamples spatial dimensions while increasing channel depth, producing feature maps that capture character-level visual patterns (strokes, curves, spacing). Intermediate feature maps are concatenated to preserve multi-scale information critical for recognizing text at different scales and resolutions.

Solves for

I need to extract robust visual features from textline images before sequence decodingI want efficient feature extraction that runs on mobile devices without excessive memoryI need to handle text at multiple scales and resolutions within a single model

Best for

mobile OCR pipelines where model size and latency are critical

teams building custom recognition models on top of pre-extracted features

applications processing textlines at varying resolutions

Requires

Input images normalized to [0, 1] or [-1, 1] range

Image height fixed at 32 pixels (or model-specific height); width variable

PaddlePaddle 2.4+ with ResNet and depthwise separable convolution support

Limitations

Requires fixed input height (typically 32 pixels); width is variable but extreme aspect ratios (>10:1) degrade feature quality

Depthwise separable convolutions reduce model capacity; may underfit on complex fonts or handwriting

Feature extraction is not end-to-end differentiable with detection models; requires separate training pipeline

What makes it unique

Uses depthwise separable convolutions throughout the ResNet backbone to reduce parameters by ~70% compared to standard ResNet, while concatenating features from multiple scales (stride 4, 8, 16) to preserve fine-grained character details. This hybrid approach balances mobile efficiency with multi-scale robustness.

vs alternatives

More parameter-efficient than standard ResNet50 used in EasyOCR, and faster than VGG-based backbones in Tesseract; trades some capacity for mobile deployability.

quantized inference for mobile deployment

Medium confidence

Deploys the recognition model on mobile devices using INT8 quantization and PaddleLite runtime, reducing model size from ~200MB (FP32) to ~8-10MB (INT8) with minimal accuracy loss (<1%). Quantization is applied post-training using calibration data; the model is converted to PaddleLite format with operator fusion and memory layout optimization for ARM processors. Inference runs directly on mobile CPUs without GPU dependency, achieving 10-50ms latency per textline on modern mobile hardware.

Solves for

I need to deploy OCR on Android/iOS without cloud API calls or GPU requirementsI want to reduce model size for app distribution and on-device storage constraintsI need real-time textline recognition with <50ms latency on mobile CPUs

Best for

mobile app developers building offline-first OCR features

teams with privacy requirements preventing cloud inference

resource-constrained edge devices (IoT, embedded systems)

Requires

PaddleLite 2.10+ for Android/iOS

Android NDK 21+ (for Android) or Xcode 12+ (for iOS)

Target device with ARM processor (ARMv7, ARMv8); no x86 support

Limitations

INT8 quantization introduces ~0.5-1% accuracy degradation on some font types; requires validation on target domain

PaddleLite runtime is Android/iOS specific; no support for other mobile platforms (Windows Phone, etc.)

Quantization requires calibration dataset; generic calibration may not match production text distribution

What makes it unique

Applies post-training INT8 quantization with per-channel scaling and operator fusion specifically tuned for PaddleLite's ARM backend, achieving 20x model size reduction while maintaining <1% accuracy loss. Unlike generic quantization frameworks, incorporates PaddleOCR-specific calibration strategies for text recognition workloads.

vs alternatives

Smaller deployment footprint than TensorFlow Lite quantized models, and faster inference than ONNX Runtime on mobile; requires PaddleLite ecosystem lock-in.

batch image preprocessing and normalization

Medium confidence

Preprocesses variable-width textline images into normalized batches for inference, handling resizing, padding, and channel normalization. Images are resized to fixed height (32 pixels) while preserving aspect ratio, padded to a common width within the batch, and normalized using ImageNet statistics (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]). Preprocessing is implemented in C++ for PaddleLite and Python for server inference, with SIMD optimizations for mobile platforms.

Solves for

I need to prepare variable-sized textline crops for batch inferenceI want to normalize images consistently across different sources and lighting conditionsI need efficient preprocessing that doesn't bottleneck inference latency

Best for

teams building OCR pipelines with variable input image sizes

applications processing batches of textlines from document scans

mobile apps requiring fast image preprocessing on CPU

Requires

Input images in uint8 format (0-255 range)

Image height ≥16 pixels, width ≥8 pixels

OpenCV 4.0+ (for Python preprocessing) or PaddleLite C++ runtime (for mobile)

Limitations

Fixed height (32 pixels) may lose information from very small text (<8pt) or very large text (>72pt)

Aspect ratio preservation can result in padding up to 30% of image area for extreme aspect ratios; padding adds noise

ImageNet normalization statistics may not be optimal for document images with different color distributions

What makes it unique

Implements dual preprocessing pipelines: C++ SIMD-optimized path for PaddleLite mobile inference (using NEON on ARM), and Python path for server inference. Preprocessing is fused with model loading to minimize memory copies; padding strategy uses dynamic batch width calculation to minimize wasted computation.

vs alternatives

Faster preprocessing than OpenCV-only pipelines due to SIMD optimization, and more memory-efficient than pre-padding all images to maximum width; requires PaddlePaddle ecosystem integration.

character-level confidence scoring and filtering

Medium confidence

Extracts character-level confidence scores from model output logits and applies post-processing filters to remove low-confidence predictions. The model outputs logits for each character position; softmax is applied to convert to probabilities, and per-character confidence is extracted as the maximum probability. Filtering strategies include: removing characters with confidence <threshold, merging adjacent low-confidence predictions, and flagging uncertain regions for manual review. Confidence scores enable downstream applications to prioritize high-confidence text for processing.

Solves for

I need to identify which characters the model is uncertain aboutI want to filter out low-confidence predictions before downstream processingI need to flag uncertain textlines for manual review or re-processing

Best for

document processing pipelines with quality control requirements

teams building human-in-the-loop OCR systems

applications where false positives are costly (e.g., financial documents)

Requires

Model output logits (shape [sequence_length, num_classes])

Softmax implementation (built into PaddlePaddle or numpy)

Limitations

Confidence scores are not calibrated; a score of 0.9 does not guarantee 90% accuracy across all character types

Confidence is per-character, not per-word or per-line; no built-in word-level confidence aggregation

Filtering by confidence threshold requires manual tuning per domain; no automatic threshold selection

What makes it unique

Provides per-character confidence scores extracted from softmax probabilities, with optional filtering and flagging for manual review. Unlike end-to-end confidence estimation, this approach is model-agnostic and can be applied to any sequence prediction model; confidence calibration is left to the application layer.

vs alternatives

More granular than binary accept/reject decisions, and enables downstream quality control workflows; less reliable than ensemble-based confidence estimation but computationally cheaper.

integration with paddleocr detection pipeline

Medium confidence

Designed as the recognition stage of PaddleOCR's two-stage pipeline, consuming textline bounding boxes and cropped images from the detection model (en_PP-OCRv5_mobile_det). The recognition model expects pre-cropped textline images with minimal padding; integration requires coordinate transformation from detection output (rotated bounding boxes) to axis-aligned crops. PaddleOCR provides end-to-end orchestration via the OCRv5 inference API, handling detection→crop→recognition→post-processing in a single call.

Solves for

I want to use the full PaddleOCR pipeline (detection + recognition) for end-to-end document OCRI need to integrate this recognition model with PaddleOCR's detection modelI want to process documents with minimal custom code using PaddleOCR's high-level API

Best for

teams using PaddleOCR's complete OCR pipeline

developers wanting end-to-end document processing without custom orchestration

applications processing documents with standard text layouts

Requires

PaddleOCR 2.7+ library

en_PP-OCRv5_mobile_det detection model (or compatible detection model)

Python 3.7+ with paddleocr package

Limitations

Tightly coupled to PaddleOCR's detection model; cannot easily swap with other detection models

Requires detection model output in specific format; custom detection models need coordinate transformation

End-to-end pipeline latency is dominated by detection; recognition is only ~20-30% of total time

What makes it unique

Designed as the recognition component of PaddleOCR's modular two-stage architecture, with built-in coordinate transformation and batch processing optimized for detection output. Unlike standalone recognition models, includes PaddleOCR-specific post-processing (duplicate removal, confidence filtering) and high-level API integration.

vs alternatives

Seamless integration with PaddleOCR ecosystem; requires less custom code than combining independent detection and recognition models; trades flexibility for ease of use.

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with en_PP-OCRv5_mobile_rec, ranked by overlap. Discovered automatically through the match graph.

Model38

PP-LCNet_x1_0_textline_ori

image-to-text model by undefined. 1,86,085 downloads.

textline orientation classification via lightweight cnnmulti-language textline orientation detection with language-agnostic featuresefficient inference on mobile and edge devices via model quantization and optimization

3 shared capabilities

Model40

trocr-large-handwritten

image-to-text model by undefined. 2,15,807 downloads.

autoregressive-text-generation-from-visual-inputhandwritten-text-recognition-from-images

2 shared capabilities

Model20

Qwen: Qwen3 VL 30B A3B Instruct

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

optical character recognition and text extraction from images

1 shared capability

Model42

PP-OCRv5_server_det

image-to-text model by undefined. 5,42,474 downloads.

text-region-detection-in-images

1 shared capability

Model20

Qwen: Qwen VL Plus

Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for...

dense text recognition and ocr from images

1 shared capability

Model52

GLM-OCR

image-to-text model by undefined. 75,19,420 downloads.

image-to-text sequence generation with visual grounding

1 shared capability

Best For

✓mobile app developers building on-device OCR pipelines
✓edge computing teams deploying document processing on IoT devices
✓teams using PaddleOCR's detection+recognition two-stage pipeline
✓developers targeting Android/iOS with real-time text recognition
✓document processing pipelines with mixed-length text fields
✓applications requiring interpretability of character-level predictions
✓teams building custom OCR systems with variable input constraints
✓mobile OCR pipelines where model size and latency are critical

Known Limitations

⚠Requires pre-cropped textline images — does not perform text detection itself; must be paired with a detection model
⚠Optimized for English text only; multilingual support requires separate language-specific models
⚠Performance degrades on rotated text >45 degrees or severely skewed/curved text without preprocessing
⚠Batch inference not optimized; processes single textlines sequentially, adding latency for high-volume document processing
⚠No built-in confidence scoring per character — only sequence-level predictions available
⚠Attention computation adds ~15-20% latency overhead vs non-attentional LSTM baselines

Requirements

PaddlePaddle 2.4+ runtime or ONNX Runtime 1.14+ for inferencePre-processed textline image crops (typically 32×320 pixels or similar aspect ratio)For mobile: PaddleLite inference framework (Android/iOS SDKs available)For server: Python 3.7+ with paddleocr or paddlepaddle packagesPaddlePaddle 2.4+ with LSTM and attention operator supportInput feature maps from CNN backbone (typically 512-1024 channels, variable spatial dimensions)Input images normalized to [0, 1] or [-1, 1] rangeImage height fixed at 32 pixels (or model-specific height); width variable

Input / Output

Accepts: image (PNG, JPG, BMP formats), numpy array (uint8, shape [H, W, 3] or [H, W, 1]), tensor (PaddlePaddle or ONNX format), feature tensor (shape [batch, channels, height, width]), image tensor (shape [batch, 3, 32, width], dtype float32), numpy array (uint8, auto-normalized internally), image tensor (shape [1, 3, 32, width], dtype uint8 or float32), numpy array (shape [H, W, 3], dtype uint8), PIL Image, raw image bytes, model logits (tensor, shape [sequence_length, num_classes]), document image (full page or region), detection bounding boxes (from detection model)

Produces: text string (recognized characters), character sequence with implicit confidence (model output logits), structured data: {text: string, confidence: float}, character sequence (string), attention weight matrices (shape [sequence_length, spatial_height, spatial_width]), feature tensor (shape [batch, 512-1024, 1, width/8], dtype float32), multi-scale feature maps (intermediate layer outputs), character predictions (shape [1, sequence_length, num_classes], dtype float32), text string (post-processed from predictions), normalized tensor (shape [batch, 3, 32, padded_width], dtype float32), metadata (original widths, padding amounts), text string with confidence annotations, structured data: {char: string, confidence: float}[], filtered text (low-confidence characters removed or masked), OCR results: {text: string, confidence: float, bbox: [x, y, w, h]}[], structured document representation (optional)

UnfragileRank

Adoption56%(40% weight)

Quality16%(20% weight)

Ecosystem50%(15% weight)

Match Graph10%(20% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Model

7 capabilities

Visit en_PP-OCRv5_mobile_rec→

Model Details

huggingface

Provider

PaddleOCR

Architecture

307,131

Downloads

Tasks

image-to-text

About

PaddlePaddle/en_PP-OCRv5_mobile_rec — a image-to-text model on HuggingFace with 3,07,131 downloads

Alternatives to en_PP-OCRv5_mobile_rec

Dreambooth-Stable-Diffusion45Repository

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Compare →

sdnext51Repository

SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing

Compare →

fast-stable-diffusion48Repository

fast-stable-diffusion + DreamBooth

Compare →

ai-notes37Prompt

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

Compare →

Are you the builder of en_PP-OCRv5_mobile_rec?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

huggingface

Looking for something else?

Search →

Capabilities7 decomposed

mobile-optimized textline recognition from image crops

Medium confidence

Solves for

Best for

mobile app developers building on-device OCR pipelines

edge computing teams deploying document processing on IoT devices

teams using PaddleOCR's detection+recognition two-stage pipeline

Requires

PaddlePaddle 2.4+ runtime or ONNX Runtime 1.14+ for inference

Pre-processed textline image crops (typically 32×320 pixels or similar aspect ratio)

For mobile: PaddleLite inference framework (Android/iOS SDKs available)

Limitations

Requires pre-cropped textline images — does not perform text detection itself; must be paired with a detection model

Optimized for English text only; multilingual support requires separate language-specific models

Performance degrades on rotated text >45 degrees or severely skewed/curved text without preprocessing

What makes it unique

vs alternatives

variable-length sequence decoding with attention

Medium confidence

Solves for

Best for

document processing pipelines with mixed-length text fields

applications requiring interpretability of character-level predictions

teams building custom OCR systems with variable input constraints

Requires

PaddlePaddle 2.4+ with LSTM and attention operator support

Input feature maps from CNN backbone (typically 512-1024 channels, variable spatial dimensions)

Limitations

Attention computation adds ~15-20% latency overhead vs non-attentional LSTM baselines

Attention weights are not guaranteed to be human-interpretable; may not align with actual character boundaries

Performance degrades on sequences >50 characters due to attention mechanism saturation

What makes it unique

vs alternatives

More accurate than fixed-length CTC decoders on variable-length text, and more interpretable than pure RNN baselines; trades computational cost for robustness on diverse text lengths.

resnet-based feature extraction for textline images

Medium confidence

Solves for

Best for

mobile OCR pipelines where model size and latency are critical

teams building custom recognition models on top of pre-extracted features

applications processing textlines at varying resolutions

Requires

Input images normalized to [0, 1] or [-1, 1] range

Image height fixed at 32 pixels (or model-specific height); width variable

PaddlePaddle 2.4+ with ResNet and depthwise separable convolution support

Limitations

Requires fixed input height (typically 32 pixels); width is variable but extreme aspect ratios (>10:1) degrade feature quality

Depthwise separable convolutions reduce model capacity; may underfit on complex fonts or handwriting

Feature extraction is not end-to-end differentiable with detection models; requires separate training pipeline

What makes it unique

vs alternatives

More parameter-efficient than standard ResNet50 used in EasyOCR, and faster than VGG-based backbones in Tesseract; trades some capacity for mobile deployability.

quantized inference for mobile deployment

Medium confidence

Solves for

Best for

mobile app developers building offline-first OCR features

teams with privacy requirements preventing cloud inference

resource-constrained edge devices (IoT, embedded systems)

Requires

PaddleLite 2.10+ for Android/iOS

Android NDK 21+ (for Android) or Xcode 12+ (for iOS)

Target device with ARM processor (ARMv7, ARMv8); no x86 support

Limitations

INT8 quantization introduces ~0.5-1% accuracy degradation on some font types; requires validation on target domain

PaddleLite runtime is Android/iOS specific; no support for other mobile platforms (Windows Phone, etc.)

Quantization requires calibration dataset; generic calibration may not match production text distribution

What makes it unique

vs alternatives

Smaller deployment footprint than TensorFlow Lite quantized models, and faster inference than ONNX Runtime on mobile; requires PaddleLite ecosystem lock-in.

batch image preprocessing and normalization

Medium confidence

Solves for

Best for

teams building OCR pipelines with variable input image sizes

applications processing batches of textlines from document scans

mobile apps requiring fast image preprocessing on CPU

Requires

Input images in uint8 format (0-255 range)

Image height ≥16 pixels, width ≥8 pixels

OpenCV 4.0+ (for Python preprocessing) or PaddleLite C++ runtime (for mobile)

Limitations

Fixed height (32 pixels) may lose information from very small text (<8pt) or very large text (>72pt)

Aspect ratio preservation can result in padding up to 30% of image area for extreme aspect ratios; padding adds noise

ImageNet normalization statistics may not be optimal for document images with different color distributions

What makes it unique

vs alternatives

Faster preprocessing than OpenCV-only pipelines due to SIMD optimization, and more memory-efficient than pre-padding all images to maximum width; requires PaddlePaddle ecosystem integration.

character-level confidence scoring and filtering

Medium confidence

Solves for

Best for

document processing pipelines with quality control requirements

teams building human-in-the-loop OCR systems

applications where false positives are costly (e.g., financial documents)

Requires

Model output logits (shape [sequence_length, num_classes])

Softmax implementation (built into PaddlePaddle or numpy)

Limitations

Confidence scores are not calibrated; a score of 0.9 does not guarantee 90% accuracy across all character types

Confidence is per-character, not per-word or per-line; no built-in word-level confidence aggregation

Filtering by confidence threshold requires manual tuning per domain; no automatic threshold selection

What makes it unique

vs alternatives

More granular than binary accept/reject decisions, and enables downstream quality control workflows; less reliable than ensemble-based confidence estimation but computationally cheaper.

integration with paddleocr detection pipeline

Medium confidence

Solves for

Best for

teams using PaddleOCR's complete OCR pipeline

developers wanting end-to-end document processing without custom orchestration

applications processing documents with standard text layouts

Requires

PaddleOCR 2.7+ library

en_PP-OCRv5_mobile_det detection model (or compatible detection model)

Python 3.7+ with paddleocr package

Limitations

Tightly coupled to PaddleOCR's detection model; cannot easily swap with other detection models

Requires detection model output in specific format; custom detection models need coordinate transformation

End-to-end pipeline latency is dominated by detection; recognition is only ~20-30% of total time

What makes it unique

vs alternatives

Seamless integration with PaddleOCR ecosystem; requires less custom code than combining independent detection and recognition models; trades flexibility for ease of use.

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to en_PP-OCRv5_mobile_rec

Dreambooth-Stable-Diffusion45Repository

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Compare →

sdnext51Repository

SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing

Compare →

fast-stable-diffusion48Repository

fast-stable-diffusion + DreamBooth

Compare →

ai-notes37Prompt

Compare →

en_PP-OCRv5_mobile_rec

Capabilities7 decomposed

mobile-optimized textline recognition from image crops

variable-length sequence decoding with attention

resnet-based feature extraction for textline images

quantized inference for mobile deployment

batch image preprocessing and normalization

character-level confidence scoring and filtering

integration with paddleocr detection pipeline

Related Artifactssharing capabilities

PP-LCNet_x1_0_textline_ori

trocr-large-handwritten

Qwen: Qwen3 VL 30B A3B Instruct

PP-OCRv5_server_det

Qwen: Qwen VL Plus

GLM-OCR

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Model Details

About

Categories

Alternatives to en_PP-OCRv5_mobile_rec

Are you the builder of en_PP-OCRv5_mobile_rec?

Get the weekly brief

Data Sources

en_PP-OCRv5_mobile_rec

Capabilities7 decomposed

mobile-optimized textline recognition from image crops

variable-length sequence decoding with attention

resnet-based feature extraction for textline images

quantized inference for mobile deployment

batch image preprocessing and normalization

character-level confidence scoring and filtering

integration with paddleocr detection pipeline

Related Artifactssharing capabilities

PP-LCNet_x1_0_textline_ori

trocr-large-handwritten

Qwen: Qwen3 VL 30B A3B Instruct

PP-OCRv5_server_det

Qwen: Qwen VL Plus

GLM-OCR

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Model Details

About

Categories

Alternatives to en_PP-OCRv5_mobile_rec

Are you the builder of en_PP-OCRv5_mobile_rec?

Get the weekly brief

Data Sources