What can rtdetr_r50vd_coco_o365 do?

real-time object detection with transformer-based architecture, multi-dataset transfer learning with coco and objects365 pre-training, batch inference with dynamic input shape handling, huggingface model hub integration with safetensors format, coco benchmark evaluation with standard metrics, inference optimization for edge deployment with quantization support

rtdetr_r50vd_coco_o365

ModelFree

object-detection model by undefined. 86,670 downloads.

Open Source

/ 100

6 capabilities

Capabilities6 decomposed

real-time object detection with transformer-based architecture

Medium confidence

Performs object detection using RT-DETR (Real-Time Detection Transformer), a transformer-based architecture that replaces traditional CNN-based detectors. The model uses a ResNet-50-VD backbone for feature extraction, followed by transformer encoder-decoder layers for end-to-end object localization and classification. Unlike YOLO or Faster R-CNN, it directly predicts object coordinates and classes without anchor boxes or non-maximum suppression, enabling faster inference and simpler post-processing pipelines.

Solves for

detect and localize multiple objects in images with real-time performance constraintsintegrate object detection into production systems requiring sub-100ms inference latencybuild vision applications that need transformer-based reasoning over spatial featuresdeploy detection models across diverse hardware (CPU, GPU, mobile) with consistent architecture

Best for

computer vision engineers building real-time detection systems

teams deploying edge AI models requiring transformer-based architectures

developers migrating from anchor-based detectors (YOLO, Faster R-CNN) to anchor-free approaches

Requires

PyTorch 1.9+ or TensorFlow 2.8+ (model available in both frameworks via HuggingFace transformers)

torchvision or equivalent vision library for image preprocessing (normalization, resizing)

CUDA 11.0+ for GPU acceleration (CPU inference possible but 10-20x slower)

Limitations

ResNet-50-VD backbone limits feature resolution compared to larger backbones (ResNet-101, EfficientNet); trades accuracy for speed

transformer decoder adds ~15-25ms latency per image compared to CNN-only detectors on CPU inference

requires careful batch normalization tuning when fine-tuning on custom datasets; batch size <8 may cause training instability

What makes it unique

Uses transformer encoder-decoder architecture with deformable attention mechanisms instead of traditional CNN-based region proposal networks; eliminates anchor boxes and NMS post-processing, reducing inference pipeline complexity while maintaining real-time performance through efficient attention computation

vs alternatives

Faster inference than Faster R-CNN (no RPN overhead) and simpler than YOLO (no anchor engineering), while maintaining transformer-based reasoning for improved generalization across diverse object scales and aspect ratios

multi-dataset transfer learning with coco and objects365 pre-training

Medium confidence

The model is pre-trained on both COCO (80 object classes) and Objects365 (365 object classes) datasets, enabling transfer learning across diverse visual domains. The dual-dataset pre-training approach allows the model to learn both fine-grained object distinctions (COCO) and broad object category coverage (Objects365), with learned representations that generalize to custom detection tasks. Fine-tuning can be performed by replacing the classification head while preserving the transformer backbone's learned spatial reasoning.

Solves for

fine-tune a pre-trained detector on custom datasets with minimal labeled data (few-shot detection)leverage multi-domain pre-training to detect object categories not present in COCO or Objects365reduce training time and computational cost by starting from converged weights rather than random initializationevaluate model performance on standard COCO benchmarks to compare against published baselines

Best for

teams with limited labeled data for custom object detection tasks

researchers benchmarking detection architectures against COCO leaderboards

practitioners building domain-specific detectors (medical imaging, industrial inspection) with transfer learning

Requires

PyTorch 1.9+ with torchvision for COCO dataset utilities

custom dataset annotations in COCO JSON format or equivalent (image_id, category_id, bbox, area)

minimum 100-500 labeled examples per custom class for stable fine-tuning

Limitations

COCO pre-training biases model toward common object categories; performance degrades on rare or domain-specific objects without sufficient fine-tuning data

Objects365 dataset contains noisy labels and class imbalance; some object categories have <100 training examples, limiting their learned representations

fine-tuning on datasets with significantly different object distributions (e.g., aerial imagery, microscopy) may require careful learning rate scheduling and data augmentation to avoid catastrophic forgetting

What makes it unique

Combines COCO (80 classes, high-quality annotations) and Objects365 (365 classes, broader coverage) pre-training in a single model, enabling transfer learning that balances annotation quality with category diversity—a rare combination in published detection models

vs alternatives

Broader object category coverage than COCO-only models (365 vs 80 classes) while maintaining COCO's annotation quality, reducing fine-tuning data requirements compared to training from scratch on custom datasets

batch inference with dynamic input shape handling

Medium confidence

Supports variable-sized image batches with automatic padding and resizing to model input dimensions (typically 640x640 or 800x800). The model uses dynamic shape handling via transformer attention mechanisms that are invariant to spatial dimensions, allowing efficient batching of images with different aspect ratios without explicit resizing that distorts objects. Inference can be performed on single images or batches, with automatic tensor shape inference and output unbatching.

Solves for

process multiple images in parallel for throughput optimization on GPU/TPUhandle images with varying aspect ratios without manual preprocessing or distortionintegrate batch inference into data pipelines (ETL, video frame processing) with minimal preprocessing overheadbenchmark inference latency across different batch sizes to optimize deployment configurations

Best for

data engineers building image processing pipelines requiring high throughput

ML engineers optimizing inference latency and memory utilization on cloud GPUs

developers deploying detection models in batch processing systems (video analysis, image archives)

Requires

PyTorch 1.9+ with CUDA for GPU batching, or CPU inference (significantly slower)

minimum 4GB VRAM for batch_size=4 at 640x640 resolution; scales to 16GB+ for batch_size=32

image preprocessing library (torchvision.transforms or Pillow) for resizing and normalization

Limitations

dynamic shape handling adds ~5-10% overhead per batch due to padding computation; fixed-shape batches are slightly faster

memory consumption scales quadratically with image resolution; batch_size must be reduced for high-resolution inputs (>1024x1024)

padding introduces false positives at image boundaries in some cases; requires post-processing to filter detections near padded regions

What makes it unique

Transformer-based architecture enables dynamic shape handling without explicit anchor box resizing; uses deformable attention to adapt to variable input dimensions, avoiding the aspect ratio distortion common in CNN-based detectors that require fixed input sizes

vs alternatives

More efficient batch processing than anchor-based detectors (YOLO, Faster R-CNN) which require fixed input shapes; dynamic shape handling reduces preprocessing overhead and enables natural aspect ratio preservation

huggingface model hub integration with safetensors format

Medium confidence

Model is hosted on HuggingFace Model Hub with safetensors serialization format, enabling one-line loading via the transformers library. The safetensors format provides faster deserialization than pickle-based .pth files and includes built-in integrity checking. Integration with HuggingFace's model card system provides versioning, documentation, and automatic endpoint deployment to cloud platforms (AWS SageMaker, Azure ML, Hugging Face Inference API).

Solves for

load pre-trained model weights with a single Python import statement without manual weight downloadingdeploy model to managed inference endpoints (HuggingFace, AWS, Azure) with zero infrastructure setupversion control model checkpoints and track training metadata through HuggingFace's model versioning systemintegrate model into existing HuggingFace pipelines and downstream applications

Best for

Python developers using HuggingFace transformers ecosystem

teams deploying models to managed cloud inference platforms

researchers sharing reproducible model checkpoints with built-in documentation

Requires

Python 3.7+

transformers library 4.25.0+

torch 1.9+ or tensorflow 2.8+

Limitations

requires internet connectivity to download model weights on first load (~500MB-1GB); no offline mode without pre-caching

safetensors format is newer and may have compatibility issues with older PyTorch versions (<1.9)

HuggingFace Inference API has rate limits (free tier: 1 request/second); production deployments require paid tier

What makes it unique

Uses safetensors serialization format instead of pickle-based .pth, providing faster loading (2-3x speedup), deterministic deserialization, and built-in security checks; integrated with HuggingFace's managed inference endpoints for one-click deployment

vs alternatives

Faster model loading than traditional PyTorch checkpoints and simpler deployment than self-hosted inference servers; HuggingFace integration eliminates manual weight management and provides automatic scaling on managed platforms

coco benchmark evaluation with standard metrics

Medium confidence

Model is evaluated on COCO dataset using standard detection metrics (mAP@0.5, mAP@0.5:0.95, per-class precision/recall). Evaluation uses COCO's official evaluation protocol with IoU thresholds and area-based metrics (small, medium, large objects). The model card includes published benchmark results, enabling direct comparison against other detectors on the same evaluation protocol.

Solves for

compare model performance against published baselines and other detectors on COCOevaluate custom fine-tuned models using standard COCO metrics for reproducibilityunderstand model performance across object size categories (small, medium, large)track performance improvements during model development and hyperparameter tuning

Best for

researchers benchmarking detection architectures

ML engineers validating model improvements before production deployment

teams publishing detection models with reproducible evaluation results

Requires

COCO dataset (val2017 split, ~5GB) or COCO API for metric computation

pycocotools library for official COCO evaluation

predictions in COCO JSON format (image_id, category_id, bbox, score)

Limitations

COCO metrics (mAP@0.5:0.95) are computationally expensive; evaluation on full COCO val set (5000 images) requires 10-30 minutes on single GPU

mAP metric is sensitive to confidence threshold selection; reported numbers assume optimal threshold tuning

COCO evaluation assumes axis-aligned bounding boxes; rotated or polygon annotations are not supported

What makes it unique

Provides published COCO benchmark results on model card, enabling direct comparison against 100+ published detectors on identical evaluation protocol; includes per-class and per-area breakdowns for detailed performance analysis

vs alternatives

Standard COCO evaluation enables reproducible comparisons across detectors; published results on model card eliminate need for manual evaluation setup, unlike proprietary or custom evaluation protocols

inference optimization for edge deployment with quantization support

Medium confidence

Model supports post-training quantization (INT8, FP16) for reduced model size and faster inference on edge devices. Quantization is applied to weights and activations while preserving detection accuracy within 1-2% of full-precision baseline. The model can be exported to ONNX format for cross-platform deployment (mobile, embedded systems, browsers) with optimized inference engines (TensorRT, CoreML, ONNX Runtime).

Solves for

deploy object detection to edge devices (mobile phones, embedded systems, IoT) with <100MB model sizereduce inference latency on CPU-only devices by 3-5x through quantization and ONNX optimizationexport model to mobile frameworks (CoreML for iOS, TensorFlow Lite for Android) for on-device inferenceoptimize inference cost on cloud platforms by reducing memory footprint and compute requirements

Best for

mobile developers building on-device vision applications

embedded systems engineers deploying models to resource-constrained hardware

teams optimizing inference cost on cloud platforms with per-GB memory pricing

Requires

PyTorch 1.9+ with quantization support (torch.quantization)

ONNX and onnx-simplifier for model export and optimization

TensorRT (NVIDIA), CoreML (Apple), or ONNX Runtime for edge inference

Limitations

INT8 quantization reduces accuracy by 1-3% on COCO; FP16 quantization has negligible accuracy loss but requires GPU support

ONNX export requires manual conversion; no built-in ONNX export in transformers library for RT-DETR

quantized models are less flexible for fine-tuning; retraining quantized models requires special techniques (quantization-aware training)

What makes it unique

Transformer-based architecture enables efficient quantization through attention mechanism sparsity; deformable attention naturally reduces computation on non-informative regions, making INT8 quantization more effective than CNN-based detectors

vs alternatives

Quantization-friendly transformer architecture achieves better accuracy retention (1-2% loss vs 3-5% for CNNs) at INT8 precision; ONNX export enables cross-platform deployment without platform-specific retraining

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with rtdetr_r50vd_coco_o365, ranked by overlap. Discovered automatically through the match graph.

Model40

rtdetr_r18vd_coco_o365

object-detection model by undefined. 5,21,638 downloads.

multi-dataset transfer learning with coco and objects365 pre-trainingreal-time object detection with transformer-based architecturebatch inference with dynamic input resolution

3 shared capabilities

Model36

rtdetr_v2_r18vd

object-detection model by undefined. 1,10,212 downloads.

real-time object detection with deformable transformer attentioncoco-pretrained multi-class object classification and localizationbatch inference with dynamic input resolution

3 shared capabilities

Model39

yolos-tiny

object-detection model by undefined. 96,175 downloads.

coco-pretrained multi-class object detection with 80 object categoriesvision transformer-based object detection with attention-weighted region proposalsfine-tuning on custom object detection datasets with transfer learning

3 shared capabilities

Model43

detr-resnet-50

object-detection model by undefined. 2,28,520 downloads.

end-to-end transformer-based object detection with resnet-50 backbonefine-tuning on custom datasets with transfer learningtransformer encoder-decoder with learned object queries for set prediction

3 shared capabilities

Model36

rtdetr_r101vd_coco_o365

object-detection model by undefined. 1,02,666 downloads.

multi-domain object detection with coco+objects365 pretrainingreal-time object detection with transformer-based architecture

2 shared capabilities

Model34

rtdetr_r50vd

object-detection model by undefined. 36,914 downloads.

real-time object detection with deformable transformer architecturecoco-pretrained weight initialization with transfer learning support

2 shared capabilities

Best For

✓computer vision engineers building real-time detection systems
✓teams deploying edge AI models requiring transformer-based architectures
✓developers migrating from anchor-based detectors (YOLO, Faster R-CNN) to anchor-free approaches
✓teams with limited labeled data for custom object detection tasks
✓researchers benchmarking detection architectures against COCO leaderboards
✓practitioners building domain-specific detectors (medical imaging, industrial inspection) with transfer learning
✓data engineers building image processing pipelines requiring high throughput
✓ML engineers optimizing inference latency and memory utilization on cloud GPUs

Known Limitations

⚠ResNet-50-VD backbone limits feature resolution compared to larger backbones (ResNet-101, EfficientNet); trades accuracy for speed
⚠transformer decoder adds ~15-25ms latency per image compared to CNN-only detectors on CPU inference
⚠requires careful batch normalization tuning when fine-tuning on custom datasets; batch size <8 may cause training instability
⚠no built-in support for video-level temporal consistency; requires external frame-to-frame tracking for video applications
⚠COCO pre-training biases model toward common object categories; performance degrades on rare or domain-specific objects without sufficient fine-tuning data
⚠Objects365 dataset contains noisy labels and class imbalance; some object categories have <100 training examples, limiting their learned representations

Requirements

PyTorch 1.9+ or TensorFlow 2.8+ (model available in both frameworks via HuggingFace transformers)torchvision or equivalent vision library for image preprocessing (normalization, resizing)CUDA 11.0+ for GPU acceleration (CPU inference possible but 10-20x slower)minimum 2GB VRAM for batch inference; 4GB+ recommended for batch_size > 4PyTorch 1.9+ with torchvision for COCO dataset utilitiescustom dataset annotations in COCO JSON format or equivalent (image_id, category_id, bbox, area)minimum 100-500 labeled examples per custom class for stable fine-tuning8GB+ VRAM for fine-tuning with batch_size >= 4; 16GB+ recommended for batch_size >= 8

Input / Output

Accepts: image (PNG, JPEG, BMP, TIFF), image tensor (torch.Tensor or tf.Tensor with shape [batch, 3, height, width], normalized to [0,1] or ImageNet stats), video frames (processed sequentially), COCO-format JSON annotations (images, annotations, categories), image files (PNG, JPEG) with corresponding bounding box annotations, custom dataset in Pascal VOC or YOLO format (requires conversion to COCO format), batch of images as torch.Tensor [batch_size, 3, height, width], list of PIL Image objects with variable dimensions, numpy arrays [batch_size, height, width, 3] in uint8 or float32 format, model identifier string ('PekingU/rtdetr_r50vd_coco_o365'), local path to downloaded model directory, HuggingFace model card URL, COCO val2017 images and annotations, model predictions in COCO JSON format, custom dataset in COCO format, full-precision PyTorch model, calibration dataset for quantization, ONNX model for cross-platform export

Produces: bounding boxes (x1, y1, x2, y2 or cx, cy, w, h format), class labels (integer indices or string names from COCO/Objects365 vocabulary), confidence scores (float [0,1] per detection), structured JSON with detections array, fine-tuned model weights (PyTorch .pth or safetensors format), evaluation metrics (mAP@0.5, mAP@0.5:0.95, per-class precision/recall), inference predictions on test set, batch of detection tensors [batch_size, num_detections, 6] (x1, y1, x2, y2, class_id, confidence), list of dictionaries per image with 'boxes', 'labels', 'scores' keys, structured output with per-image detection counts and aggregated statistics, loaded PyTorch model object (torch.nn.Module), model configuration (AutoConfig), image processor for preprocessing (AutoImageProcessor), mAP@0.5 and mAP@0.5:0.95 scores, per-class precision and recall curves, per-area metrics (small, medium, large objects), confusion matrices and false positive analysis, quantized PyTorch model (INT8 or FP16), ONNX model file (.onnx), platform-specific optimized models (TensorRT .engine, CoreML .mlmodel), model size reduction metrics and accuracy degradation analysis

UnfragileRank

Adoption50%(40% weight)

Quality14%(20% weight)

Ecosystem50%(15% weight)

Match Graph10%(20% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Model

6 capabilities

Visit rtdetr_r50vd_coco_o365→

Model Details

huggingface

Provider

transformers

Architecture

86,670

Downloads

Tasks

object-detection

About

PekingU/rtdetr_r50vd_coco_o365 — a object-detection model on HuggingFace with 86,670 downloads

Alternatives to rtdetr_r50vd_coco_o365

Dreambooth-Stable-Diffusion45Repository

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Compare →

sdnext51Repository

SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing

Compare →

fast-stable-diffusion48Repository

fast-stable-diffusion + DreamBooth

Compare →

ai-notes37Prompt

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

Compare →

Are you the builder of rtdetr_r50vd_coco_o365?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

huggingface

Looking for something else?

Search →

Capabilities6 decomposed

real-time object detection with transformer-based architecture

Medium confidence

Solves for

Best for

computer vision engineers building real-time detection systems

teams deploying edge AI models requiring transformer-based architectures

developers migrating from anchor-based detectors (YOLO, Faster R-CNN) to anchor-free approaches

Requires

PyTorch 1.9+ or TensorFlow 2.8+ (model available in both frameworks via HuggingFace transformers)

torchvision or equivalent vision library for image preprocessing (normalization, resizing)

CUDA 11.0+ for GPU acceleration (CPU inference possible but 10-20x slower)

Limitations

ResNet-50-VD backbone limits feature resolution compared to larger backbones (ResNet-101, EfficientNet); trades accuracy for speed

transformer decoder adds ~15-25ms latency per image compared to CNN-only detectors on CPU inference

requires careful batch normalization tuning when fine-tuning on custom datasets; batch size <8 may cause training instability

What makes it unique

vs alternatives

multi-dataset transfer learning with coco and objects365 pre-training

Medium confidence

Solves for

Best for

teams with limited labeled data for custom object detection tasks

researchers benchmarking detection architectures against COCO leaderboards

practitioners building domain-specific detectors (medical imaging, industrial inspection) with transfer learning

Requires

PyTorch 1.9+ with torchvision for COCO dataset utilities

custom dataset annotations in COCO JSON format or equivalent (image_id, category_id, bbox, area)

minimum 100-500 labeled examples per custom class for stable fine-tuning

Limitations

COCO pre-training biases model toward common object categories; performance degrades on rare or domain-specific objects without sufficient fine-tuning data

Objects365 dataset contains noisy labels and class imbalance; some object categories have <100 training examples, limiting their learned representations

What makes it unique

vs alternatives

batch inference with dynamic input shape handling

Medium confidence

Solves for

Best for

data engineers building image processing pipelines requiring high throughput

ML engineers optimizing inference latency and memory utilization on cloud GPUs

developers deploying detection models in batch processing systems (video analysis, image archives)

Requires

PyTorch 1.9+ with CUDA for GPU batching, or CPU inference (significantly slower)

minimum 4GB VRAM for batch_size=4 at 640x640 resolution; scales to 16GB+ for batch_size=32

image preprocessing library (torchvision.transforms or Pillow) for resizing and normalization

Limitations

dynamic shape handling adds ~5-10% overhead per batch due to padding computation; fixed-shape batches are slightly faster

memory consumption scales quadratically with image resolution; batch_size must be reduced for high-resolution inputs (>1024x1024)

padding introduces false positives at image boundaries in some cases; requires post-processing to filter detections near padded regions

What makes it unique

vs alternatives

huggingface model hub integration with safetensors format

Medium confidence

Solves for

Best for

Python developers using HuggingFace transformers ecosystem

teams deploying models to managed cloud inference platforms

researchers sharing reproducible model checkpoints with built-in documentation

Requires

Python 3.7+

transformers library 4.25.0+

torch 1.9+ or tensorflow 2.8+

Limitations

requires internet connectivity to download model weights on first load (~500MB-1GB); no offline mode without pre-caching

safetensors format is newer and may have compatibility issues with older PyTorch versions (<1.9)

HuggingFace Inference API has rate limits (free tier: 1 request/second); production deployments require paid tier

What makes it unique

vs alternatives

coco benchmark evaluation with standard metrics

Medium confidence

Solves for

Best for

researchers benchmarking detection architectures

ML engineers validating model improvements before production deployment

teams publishing detection models with reproducible evaluation results

Requires

COCO dataset (val2017 split, ~5GB) or COCO API for metric computation

pycocotools library for official COCO evaluation

predictions in COCO JSON format (image_id, category_id, bbox, score)

Limitations

COCO metrics (mAP@0.5:0.95) are computationally expensive; evaluation on full COCO val set (5000 images) requires 10-30 minutes on single GPU

mAP metric is sensitive to confidence threshold selection; reported numbers assume optimal threshold tuning

COCO evaluation assumes axis-aligned bounding boxes; rotated or polygon annotations are not supported

What makes it unique

vs alternatives

inference optimization for edge deployment with quantization support

Medium confidence

Solves for

Best for

mobile developers building on-device vision applications

embedded systems engineers deploying models to resource-constrained hardware

teams optimizing inference cost on cloud platforms with per-GB memory pricing

Requires

PyTorch 1.9+ with quantization support (torch.quantization)

ONNX and onnx-simplifier for model export and optimization

TensorRT (NVIDIA), CoreML (Apple), or ONNX Runtime for edge inference

Limitations

INT8 quantization reduces accuracy by 1-3% on COCO; FP16 quantization has negligible accuracy loss but requires GPU support

ONNX export requires manual conversion; no built-in ONNX export in transformers library for RT-DETR

quantized models are less flexible for fine-tuning; retraining quantized models requires special techniques (quantization-aware training)

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to rtdetr_r50vd_coco_o365

Dreambooth-Stable-Diffusion45Repository

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Compare →

sdnext51Repository

SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing

Compare →

fast-stable-diffusion48Repository

fast-stable-diffusion + DreamBooth

Compare →

ai-notes37Prompt

Compare →

rtdetr_r50vd_coco_o365

Capabilities6 decomposed

real-time object detection with transformer-based architecture

multi-dataset transfer learning with coco and objects365 pre-training

batch inference with dynamic input shape handling

huggingface model hub integration with safetensors format

coco benchmark evaluation with standard metrics

inference optimization for edge deployment with quantization support

Related Artifactssharing capabilities

rtdetr_r18vd_coco_o365

rtdetr_v2_r18vd

yolos-tiny

detr-resnet-50

rtdetr_r101vd_coco_o365

rtdetr_r50vd

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Model Details

About

Categories

Alternatives to rtdetr_r50vd_coco_o365

Are you the builder of rtdetr_r50vd_coco_o365?

Get the weekly brief

Data Sources

rtdetr_r50vd_coco_o365

Capabilities6 decomposed

real-time object detection with transformer-based architecture

multi-dataset transfer learning with coco and objects365 pre-training

batch inference with dynamic input shape handling

huggingface model hub integration with safetensors format

coco benchmark evaluation with standard metrics

inference optimization for edge deployment with quantization support

Related Artifactssharing capabilities

rtdetr_r18vd_coco_o365

rtdetr_v2_r18vd

yolos-tiny

detr-resnet-50

rtdetr_r101vd_coco_o365

rtdetr_r50vd

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Model Details

About

Categories

Alternatives to rtdetr_r50vd_coco_o365

Are you the builder of rtdetr_r50vd_coco_o365?

Get the weekly brief

Data Sources