Real Time Object Detection Model

1

MediaPipeFramework58/100

via “object detection with bounding box localization”

Google's cross-platform on-device ML framework with pre-built solutions.

Unique: Provides unified object detection API across Android, iOS, Web, and Python with built-in support for multiple pre-trained models (COCO, Open Images) and custom model fine-tuning via Model Maker; uses hardware acceleration (GPU/NPU) on mobile platforms for real-time inference.

vs others: More mobile-optimized and faster than TensorFlow Object Detection API on edge devices, includes built-in model customization via Model Maker unlike many pre-trained-only alternatives, but less feature-rich than specialized object detection frameworks like YOLOv8 or Faster R-CNN.

2

Reka APIAPI58/100

via “visual object detection and localization with bounding boxes”

Multimodal-first API — vision, audio, video understanding across Core/Flash/Edge models.

Unique: Integrated into the multimodal model architecture, enabling object detection to leverage context from video, audio, and text understanding rather than operating as an isolated vision task.

vs others: Provides object detection as part of a unified multimodal system, whereas specialized detection APIs (YOLO, Faster R-CNN services) operate independently without cross-modal context.

3

OpenCVFramework58/100

via “object detection with pre-trained cascade classifiers and dnn inference”

Comprehensive computer vision library with 2,500+ algorithms.

Unique: Unified DNN inference API abstracts model format differences (TensorFlow, PyTorch, Caffe, ONNX) behind single interface with automatic quantization and GPU offload, eliminating need for separate inference engines

vs others: Cascade classifiers are faster than YOLO for simple face detection but less accurate; DNN inference is simpler than TensorRT but 2-5x slower; better than TensorFlow Lite for desktop applications because supports larger models

4

YOLOv8Repository55/100

via “real-time object detection model”

Real-time object detection, segmentation, and pose.

Unique: YOLOv8 combines speed and accuracy with a simple Python API and extensive export formats, setting it apart from other models.

vs others: YOLOv8 offers superior performance in real-time applications compared to traditional object detection frameworks.

5

Deepseek v4 peopleModel45/100

via “people detection and recognition”

Deepseek v4 people

Unique: Utilizes a hybrid architecture combining CNNs and transformers for enhanced accuracy in diverse conditions, unlike traditional models that rely solely on CNNs.

vs others: Offers superior accuracy in challenging environments compared to standard face recognition models, which often struggle with variations in lighting and angles.

6

detr-resnet-50Model44/100

via “end-to-end transformer-based object detection with resnet-50 backbone”

object-detection model by undefined. 2,39,063 downloads.

Unique: DETR (Detection Transformer) eliminates hand-designed detection components (anchors, NMS) by formulating detection as a set prediction problem with bipartite matching, using a pure transformer encoder-decoder on top of ResNet-50 features rather than region proposal networks or anchor grids

vs others: Simpler architecture than Faster R-CNN (no RPN, no NMS) and more interpretable than YOLO, but slower inference and weaker small-object detection make it better suited for research and moderate-latency applications than production real-time systems

7

rtdetr_r18vd_coco_o365Model42/100

via “real-time object detection with transformer-based architecture”

object-detection model by undefined. 5,21,638 downloads.

Unique: Uses transformer-based detection with anchor-free, NMS-free design (RT-DETR architecture) instead of traditional Faster R-CNN/YOLO CNN pipelines; eliminates hand-crafted anchor definitions and post-processing NMS, enabling end-to-end optimization and faster convergence during training

vs others: Faster inference than DETR variants and comparable to YOLOv8 while maintaining transformer interpretability; outperforms ResNet-50 Faster R-CNN on COCO at similar latency due to efficient attention mechanisms

8

yolov10sModel41/100

via “real-time multi-scale object detection with anchor-free architecture”

object-detection model by undefined. 2,23,706 downloads.

Unique: YOLOv10 introduces an anchor-free detection head with NMS-free training, eliminating the need for hand-crafted anchor boxes and post-processing NMS operations. This architectural shift reduces hyperparameter tuning surface and improves inference speed by ~20% vs YOLOv8 while maintaining competitive accuracy on COCO.

vs others: Faster than Faster R-CNN (two-stage) for real-time use cases and simpler to deploy than EfficientDet due to anchor-free design requiring no anchor configuration; trades some precision on tiny objects vs Mask R-CNN for speed-critical applications.

9

rtdetr_r101vd_coco_o365Model39/100

via “real-time object detection with transformer-based architecture”

object-detection model by undefined. 1,21,720 downloads.

Unique: Uses transformer encoder-decoder architecture with direct set prediction (eliminating anchor boxes and NMS) combined with ResNet-101-VD backbone, achieving real-time performance through efficient attention mechanisms and hybrid CNN-transformer design that balances speed and accuracy across 365 object categories from Objects365 dataset

vs others: Faster than traditional Faster R-CNN/Mask R-CNN detectors (50-100ms vs 200-400ms) while maintaining higher accuracy than lightweight YOLO variants through transformer attention, and more practical for production than ViT-based detectors due to optimized backbone selection

10

Anzhcs_YOLOsModel39/100

via “real-time multi-class object detection with bounding box localization”

object-detection model by undefined. 86,897 downloads.

Unique: Fine-tuned variant of Ultralytics YOLO11 base model specialized for art-domain object detection, inheriting YOLO11's architectural improvements (anchor-free detection, decoupled head design) while maintaining single-stage detection efficiency. Uses Ultralytics' native PyTorch implementation with built-in export support for ONNX, TensorRT, and CoreML for cross-platform deployment.

vs others: Faster inference than Faster R-CNN or Mask R-CNN (single-stage vs two-stage detection) with better art-domain accuracy than generic COCO-trained YOLOv8 due to fine-tuning on specialized data; lighter than Vision Transformers while maintaining competitive accuracy.

11

paper2guiWeb App39/100

via “real-time object detection with yolo models”

Convert AI papers to GUI，Make it easy and convenient for everyone to use artificial intelligence technology。让每个人都简单方便的使用前沿人工智能技术

Unique: Implements multiple YOLO model variants (v5, v6, YOLOX) through NCNN with Vulkan GPU acceleration, allowing model selection based on accuracy/speed tradeoff; includes configurable confidence thresholds and NMS parameters for detection filtering; supports JSON output for programmatic integration

vs others: Faster inference than PyTorch-based YOLO implementations (NCNN optimization); standalone executable vs Python-based tools; supports multiple model variants vs single-model tools; local processing vs cloud APIs (no latency, no privacy concerns)

12

rtdetr_r50vd_coco_o365Model38/100

via “real-time object detection with transformer-based architecture”

object-detection model by undefined. 80,830 downloads.

Unique: Uses transformer encoder-decoder architecture with deformable attention mechanisms instead of traditional CNN-based region proposal networks; eliminates anchor boxes and NMS post-processing, reducing inference pipeline complexity while maintaining real-time performance through efficient attention computation

vs others: Faster inference than Faster R-CNN (no RPN overhead) and simpler than YOLO (no anchor engineering), while maintaining transformer-based reasoning for improved generalization across diverse object scales and aspect ratios

13

rtdetr_v2_r18vdModel38/100

via “real-time object detection with deformable transformer attention”

object-detection model by undefined. 1,06,918 downloads.

Unique: Uses deformable transformer attention (sampling only task-relevant spatial regions) combined with ResNet-18 backbone for real-time inference, whereas standard DETR processes full feature maps with quadratic attention complexity. This architectural choice reduces FLOPs by ~40% compared to vanilla transformer detectors while maintaining anchor-free detection paradigm.

vs others: Faster than YOLOv8 on edge devices due to deformable attention efficiency, and more accurate than lightweight anchor-based detectors (MobileNet-SSD) because transformer attention captures long-range spatial relationships without hand-crafted anchor priors.

14

rtdetr_r50vdModel36/100

via “real-time object detection with deformable transformer architecture”

object-detection model by undefined. 32,868 downloads.

Unique: Uses deformable cross-attention instead of standard multi-head attention, allowing the model to dynamically sample only task-relevant spatial regions; combined with ResNet-50-VD backbone (a more efficient variant than standard ResNet-50), this achieves <100ms inference while maintaining COCO AP of 53.0+ without NMS post-processing

vs others: Faster inference than YOLOv8 on equivalent hardware (deformable attention vs dense convolution) and more accurate than EfficientDet-D0 on COCO while using fewer parameters than Faster R-CNN variants

15

detr-resnet-50-dc5Model34/100

via “end-to-end training for object detection”

object-detection model by undefined. 38,839 downloads.

Unique: Facilitates a streamlined training process by integrating classification and localization into a single loss function, enhancing efficiency.

vs others: More efficient than traditional multi-stage training processes that require separate training for classification and localization.

16

deformable-detrModel33/100

via “deformable object detection”

object-detection model by undefined. 27,497 downloads.

Unique: Incorporates deformable attention that adjusts to the spatial distribution of objects, enhancing detection in diverse scenarios compared to static attention mechanisms.

vs others: More adaptable to varying object shapes and sizes than traditional object detection models like Faster R-CNN due to its deformable attention mechanism.

17

We’re proud to open-source LIDARLearn [R] [D] [P]Repository33/100

via “3d object detection from lidar”

We’re proud to open-source LIDARLearn [R] [D] [P]

18

Qwen: Qwen3 VL 30B A3B ThinkingModel25/100

via “object detection and localization with semantic labels”

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...

Unique: Performs object detection through language generation rather than regression heads, enabling flexible output formats and semantic understanding of object relationships without training specialized detection layers

vs others: More flexible than traditional object detection models because it can describe object relationships and properties in natural language, but trades precision for semantic richness

19

You Only Look Once: Unified, Real-Time Object Detection (YOLO)Product22/100

via “single-pass unified object detection with spatial grid regression”

* 🏆 2017: [Attention is All you Need (Transformer)](https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html)

Unique: Pioneered the single-stage detection paradigm by formulating object detection as a direct spatial regression problem on a grid, eliminating the region proposal generation stage (RPN) used by two-stage detectors. Uses a unified loss function jointly optimizing bounding box regression (L2 loss) and class prediction (cross-entropy) across all grid cells in a single forward pass through a fully-convolutional architecture.

vs others: 45-155 FPS inference speed (vs 7 FPS for Faster R-CNN) with comparable accuracy, enabling real-time video processing on single GPUs; architectural simplicity makes it 10x faster to train than region proposal methods while maintaining end-to-end differentiability.

20

Frigate NVRProduct

via “real-time object detection and classification”

Top Matches

Also Known As

Company