{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512","slug":"nvidia--segformer-b4-finetuned-ade-512-512","name":"segformer-b4-finetuned-ade-512-512","type":"finetune","url":"https://huggingface.co/nvidia/segformer-b4-finetuned-ade-512-512","page_url":"https://unfragile.ai/nvidia--segformer-b4-finetuned-ade-512-512","categories":["model-training"],"tags":["transformers","pytorch","tf","segformer","vision","image-segmentation","dataset:scene_parse_150","arxiv:2105.15203","license:other","endpoints_compatible","deploy:azure","region:us"],"pricing":{"model":"open_source","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_0","uri":"capability://image.visual.semantic.scene.segmentation.with.hierarchical.transformer.backbone","name":"semantic-scene-segmentation-with-hierarchical-transformer-backbone","description":"Performs pixel-level semantic segmentation using SegFormer's hierarchical transformer architecture (B4 variant) pretrained on ImageNet-1K and fine-tuned on ADE20K dataset. The model uses a Mix Transformer encoder with progressive downsampling stages (4:1, 8:1, 16:1, 32:1) combined with a lightweight linear decoder that processes multi-scale feature maps, enabling efficient scene understanding across 150 semantic classes without convolutions. Input images are resized to 512×512 resolution and processed through transformer blocks with overlapping patch embeddings, producing dense per-pixel class predictions with spatial coherence.","intents":["Segment indoor and outdoor scenes into semantic categories (furniture, walls, sky, people, etc.) for scene understanding applications","Extract region-of-interest masks for specific object classes in images for downstream computer vision tasks","Generate pixel-accurate segmentation maps for autonomous navigation, robotics, or augmented reality applications","Analyze scene composition and spatial layout by identifying semantic regions in photographs or video frames"],"best_for":["Computer vision engineers building scene understanding pipelines for robotics or autonomous systems","Researchers prototyping semantic segmentation models on ADE20K benchmark","Teams deploying edge inference with moderate computational budgets (B4 is mid-tier SegFormer variant)","Developers needing pre-trained models for indoor/outdoor scene analysis without fine-tuning"],"limitations":["Fixed input resolution of 512×512 — images must be resized, potentially losing fine details or distorting aspect ratios","Trained exclusively on ADE20K (150 classes) — poor generalization to custom domains or novel object categories without fine-tuning","Transformer architecture requires full image context — cannot process streaming or partial image data efficiently","Inference latency ~200-400ms on GPU (varies by hardware) — not suitable for real-time applications requiring <30ms response","No built-in uncertainty quantification or confidence scores per pixel — difficult to identify low-confidence predictions"],"requires":["PyTorch 1.9+ or TensorFlow 2.6+ (model available in both frameworks)","CUDA 11.0+ for GPU inference (CPU inference possible but 5-10x slower)","Transformers library 4.5.0+","Minimum 4GB VRAM for batch size 1 on GPU; 8GB+ recommended for batch processing","PIL/Pillow for image loading and preprocessing"],"input_types":["image (RGB, 3-channel, any resolution — internally resized to 512×512)","batch of images (supported via batching in inference frameworks)"],"output_types":["segmentation map (2D tensor, shape [512, 512], values 0-149 representing class indices)","logits tensor (shape [512, 512, 150], raw model outputs before argmax)","probability map (shape [512, 512, 150], softmax-normalized class probabilities per pixel)"],"categories":["image-visual","semantic-segmentation"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_1","uri":"capability://image.visual.multi.scale.feature.aggregation.with.linear.decoder","name":"multi-scale-feature-aggregation-with-linear-decoder","description":"Aggregates hierarchical feature maps from four transformer encoder stages (operating at 4×, 8×, 16×, and 32× downsampling) into a unified feature representation using a lightweight linear projection decoder. Each stage's output is upsampled to 1/4 resolution, concatenated, and processed through a single linear layer to produce 150-class logits. This design avoids expensive upsampling operations and learned deconvolutions, instead leveraging the transformer's inherent multi-scale understanding to maintain spatial detail while reducing computational overhead.","intents":["Efficiently combine multi-scale contextual information from transformer stages without expensive decoder networks","Maintain spatial resolution and fine boundary details while processing through deep transformer layers","Reduce model size and inference latency by replacing convolutional decoders with linear projections","Enable flexible feature fusion strategies for downstream task adaptation or transfer learning"],"best_for":["Developers optimizing segmentation models for edge devices or mobile deployment","Researchers studying efficient decoder designs for vision transformers","Teams requiring fast inference without sacrificing segmentation quality"],"limitations":["Linear decoder cannot learn complex spatial transformations — relies entirely on encoder quality","Upsampling from 32× downsampling stage may lose fine spatial details in small objects","No learnable skip connections or feature recalibration — fixed aggregation strategy","Sensitive to encoder feature quality — poor encoder performance directly impacts decoder output"],"requires":["Transformer encoder with 4-stage hierarchical output","Feature maps at 4×, 8×, 16×, 32× downsampling ratios","PyTorch or TensorFlow with tensor concatenation and linear layer support"],"input_types":["multi-scale feature tensors from encoder stages"],"output_types":["logits tensor (shape [H/4, W/4, num_classes])","segmentation map after argmax (shape [H/4, W/4])"],"categories":["image-visual","feature-fusion"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_2","uri":"capability://image.visual.ade20k.scene.parsing.with.150.semantic.classes","name":"ade20k-scene-parsing-with-150-semantic-classes","description":"Provides semantic segmentation across 150 distinct scene categories from the ADE20K dataset, including architectural elements (walls, doors, windows), furniture (chairs, tables, beds), natural objects (trees, sky, grass), and people. The model recognizes both common and rare object classes through fine-tuning on ~20K training images with dense pixel-level annotations. Predictions are returned as class indices (0-149) that map to standardized ADE20K class names, enabling direct integration with scene understanding pipelines.","intents":["Identify and localize specific semantic objects in indoor/outdoor scenes (e.g., 'find all windows', 'segment the sky')","Generate scene composition analysis by counting or measuring areas of different semantic classes","Create scene-aware masks for selective image processing (e.g., apply effects only to sky regions)","Support scene understanding for robotics applications (e.g., navigation around furniture, obstacle avoidance)"],"best_for":["Computer vision teams working with indoor scene datasets (offices, homes, public spaces)","Robotics engineers building scene-aware navigation systems","Researchers benchmarking on ADE20K or similar scene parsing tasks","Developers needing broad semantic coverage without domain-specific fine-tuning"],"limitations":["Trained on ADE20K only — poor performance on out-of-distribution domains (e.g., medical imaging, satellite imagery, synthetic scenes)","Class imbalance in training data — rare classes (e.g., specific furniture types) have lower accuracy than common classes (sky, wall)","150-class taxonomy is fixed — cannot add custom classes without retraining","Struggles with small objects and fine boundaries due to 512×512 resolution constraint","No temporal consistency — frame-to-frame flickering in video segmentation without post-processing"],"requires":["ADE20K class mapping (available in Hugging Face model card or transformers library)","Knowledge of ADE20K taxonomy for interpreting class indices","Post-processing for video applications (temporal smoothing, CRF refinement)"],"input_types":["RGB images from indoor/outdoor scenes"],"output_types":["class index map (0-149 per pixel)","class name strings (via mapping lookup)"],"categories":["image-visual","semantic-segmentation"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_3","uri":"capability://image.visual.efficient.inference.with.b4.model.variant","name":"efficient-inference-with-b4-model-variant","description":"Implements the SegFormer B4 variant, a mid-tier model in the SegFormer family (B0-B5 spectrum) that balances accuracy and computational efficiency. B4 uses 64M parameters with 4 transformer encoder stages (depths: 3, 8, 27, 3) and embedding dimensions (32, 64, 160, 256), achieving ~200-400ms inference latency on GPU and ~2-3s on CPU. This variant is positioned between B3 (faster, lower accuracy) and B5 (slower, higher accuracy), making it suitable for applications requiring real-time or near-real-time processing on standard hardware.","intents":["Deploy semantic segmentation on GPU servers with <400ms latency for batch processing","Run inference on consumer-grade GPUs (RTX 3060, A100) without memory constraints","Balance model accuracy (50.3% mIoU) against inference speed for production systems","Enable real-time video processing at 2-5 FPS on standard hardware"],"best_for":["Teams deploying on cloud GPU instances (AWS, Azure, GCP) with cost-per-inference constraints","Developers building video analysis pipelines requiring 2-5 FPS throughput","Researchers comparing efficiency-accuracy tradeoffs in transformer architectures","Production systems where B5 is too slow but B3 is insufficiently accurate"],"limitations":["Slower than B0-B3 variants — not suitable for real-time applications requiring <100ms latency","Faster than B5 but with 2-3% lower accuracy — tradeoff may be unacceptable for high-precision applications","Requires GPU for practical deployment — CPU inference (2-3s per image) is impractical for production","Memory footprint ~250MB (model weights) — requires 4GB+ VRAM for batch processing","No quantization or pruning variants available — cannot further optimize without retraining"],"requires":["GPU with 4GB+ VRAM (RTX 3060, A100, V100, or equivalent)","PyTorch 1.9+ or TensorFlow 2.6+","CUDA 11.0+ for GPU acceleration","Transformers library 4.5.0+"],"input_types":["RGB images (512×512 or resizable to 512×512)"],"output_types":["segmentation map (512×512, class indices 0-149)","inference latency metrics (ms per image)"],"categories":["image-visual","model-efficiency"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_4","uri":"capability://tool.use.integration.huggingface.model.hub.integration.with.transformers.api","name":"huggingface-model-hub-integration-with-transformers-api","description":"Provides seamless integration with Hugging Face Transformers library through standardized model loading, preprocessing, and inference APIs. The model is accessible via `transformers.AutoModelForSemanticSegmentation.from_pretrained('nvidia/segformer-b4-finetuned-ade-512-512')`, with automatic weight downloading, caching, and device management. Preprocessing is handled by `SegFormerImageProcessor` which normalizes images, resizes to 512×512, and applies ImageNet statistics. Post-processing utilities convert logits to segmentation maps and optionally upsample to original image resolution.","intents":["Load and run the model with minimal boilerplate code in Python environments","Integrate segmentation into existing Hugging Face-based pipelines and workflows","Leverage automatic model caching and version management for reproducibility","Access standardized preprocessing and postprocessing without manual implementation"],"best_for":["Python developers using Hugging Face Transformers as primary ML framework","Teams building multi-model pipelines combining NLP and vision tasks","Researchers prototyping quickly without custom preprocessing code","Developers deploying on Hugging Face Inference API or Spaces"],"limitations":["Requires Transformers library dependency — adds ~500MB to project size","Automatic device management may not optimize for specific hardware (e.g., multi-GPU setups)","Limited control over preprocessing — fixed normalization and resizing strategy","No built-in batching optimization — requires manual batch construction for efficiency","Model weights (~250MB) downloaded on first use — slow initial load on bandwidth-constrained environments"],"requires":["Python 3.7+","transformers 4.5.0+","torch 1.9+ or tensorflow 2.6+","PIL/Pillow for image handling","Internet connection for initial model download"],"input_types":["PIL Image objects","numpy arrays (H×W×3)","file paths to images"],"output_types":["SegformerForSemanticSegmentation model object","SegFormerImageProcessor preprocessor","segmentation logits and class predictions"],"categories":["tool-use-integration","model-loading"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_5","uri":"capability://data.processing.analysis.batch.inference.with.dynamic.batching.support","name":"batch-inference-with-dynamic-batching-support","description":"Supports efficient batch processing of multiple images through Transformers' native batching mechanisms, accepting lists of PIL Images or numpy arrays and processing them in parallel on GPU. The model automatically pads images to uniform size (512×512) and stacks them into batches, reducing per-image overhead. Inference returns batched logits (batch_size, 512, 512, 150) that can be processed in parallel, enabling throughput of 10-50 images/second on standard GPUs depending on batch size and hardware.","intents":["Process multiple images efficiently for batch segmentation tasks (e.g., dataset annotation, video frame processing)","Maximize GPU utilization by processing multiple images simultaneously","Reduce per-image inference latency through amortized overhead","Enable high-throughput segmentation pipelines for large-scale image processing"],"best_for":["Data processing teams annotating large image datasets","Video analysis pipelines processing frames in batches","Cloud services handling multiple concurrent segmentation requests","Researchers evaluating model performance on benchmark datasets"],"limitations":["Batch size limited by GPU VRAM — batch size 16 requires ~8GB VRAM, batch size 32 requires ~16GB","All images in batch must be resized to 512×512 — no support for variable resolution batching","Padding to uniform size may distort aspect ratios — requires post-processing to restore original dimensions","Memory overhead increases linearly with batch size — diminishing returns beyond batch size 32","No built-in dynamic batching — batch size must be predetermined"],"requires":["GPU with sufficient VRAM (4GB minimum for batch size 1, 8GB+ for batch size 8+)","PyTorch or TensorFlow with batching support","Transformers library with batch processing utilities"],"input_types":["list of PIL Images","list of numpy arrays (H×W×3)","list of file paths"],"output_types":["batched logits tensor (batch_size, 512, 512, 150)","batched segmentation maps (batch_size, 512, 512)"],"categories":["data-processing-analysis","batch-processing"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_6","uri":"capability://image.visual.image.upsampling.to.original.resolution.with.bilinear.interpolation","name":"image-upsampling-to-original-resolution-with-bilinear-interpolation","description":"Provides post-processing capability to upsample segmentation maps from 512×512 output resolution back to original input image dimensions using bilinear interpolation. The model outputs predictions at 1/4 resolution (128×128 logits upsampled to 512×512), and this capability restores full-resolution segmentation by interpolating class predictions or logits to match input image size. This enables pixel-accurate segmentation aligned with original image coordinates, critical for downstream applications like region extraction or visualization.","intents":["Restore segmentation maps to original image resolution for pixel-accurate region extraction","Enable visualization of segmentation overlays at native image resolution","Align segmentation predictions with original image coordinates for downstream processing","Support variable-resolution input images while maintaining spatial accuracy"],"best_for":["Applications requiring pixel-accurate segmentation (e.g., medical imaging, precision agriculture)","Visualization pipelines overlaying segmentation on original images","Downstream tasks that depend on precise spatial alignment (e.g., object extraction, region-based processing)","Systems handling variable-resolution input images"],"limitations":["Bilinear interpolation introduces artifacts at class boundaries — may blur fine details","Upsampling from 512×512 to high-resolution images (e.g., 4K) amplifies interpolation artifacts","No learned upsampling — cannot recover fine spatial details lost during downsampling","Computational cost scales with output resolution — 4K upsampling adds ~50-100ms latency","Aspect ratio distortion if original image is not square — requires careful handling of non-square inputs"],"requires":["Original input image resolution (height, width)","Segmentation map at 512×512 resolution","Interpolation library (PIL, OpenCV, or PyTorch)"],"input_types":["segmentation map (512, 512) with class indices or logits","target resolution (H, W)"],"output_types":["upsampled segmentation map (H, W) at original resolution","upsampled logits (H, W, 150) if using logits-based upsampling"],"categories":["image-visual","post-processing"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_7","uri":"capability://tool.use.integration.pytorch.and.tensorflow.dual.framework.support","name":"pytorch-and-tensorflow-dual-framework-support","description":"Model is available in both PyTorch and TensorFlow formats, enabling deployment across different ML ecosystems. PyTorch version uses native `torch.nn.Module` architecture with `.pt` weights, while TensorFlow version provides `tf.keras.Model` compatibility with `.h5` or SavedModel format. Transformers library automatically selects the appropriate framework based on installed dependencies, and users can explicitly specify framework preference via `from_pt=True/False` parameter during model loading.","intents":["Deploy model in existing PyTorch or TensorFlow production systems without framework conversion","Leverage framework-specific optimizations (e.g., TensorFlow Lite for mobile, PyTorch JIT for edge)","Enable team flexibility in framework choice without model reimplementation","Support both research (PyTorch) and production (TensorFlow) workflows"],"best_for":["Teams with mixed PyTorch and TensorFlow codebases","Organizations standardizing on TensorFlow for production deployment","Researchers using PyTorch for experimentation and prototyping","Developers requiring framework-specific optimizations (quantization, pruning, compilation)"],"limitations":["Dual maintenance burden — framework-specific bugs or performance issues may affect only one version","Numerical differences between frameworks due to implementation details — may cause slight accuracy variance","TensorFlow version may lag behind PyTorch in updates or bug fixes","Framework conversion adds complexity — potential for weight loading errors or shape mismatches","No guarantee of identical inference results across frameworks — requires validation"],"requires":["PyTorch 1.9+ OR TensorFlow 2.6+ (not both required, but one must be installed)","Transformers library with framework detection"],"input_types":["model weights in PyTorch (.pt) or TensorFlow (.h5, SavedModel) format"],"output_types":["PyTorch model (torch.nn.Module) or TensorFlow model (tf.keras.Model)","framework-specific inference outputs"],"categories":["tool-use-integration","framework-compatibility"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_8","uri":"capability://tool.use.integration.azure.endpoints.deployment.compatibility","name":"azure-endpoints-deployment-compatibility","description":"Model is compatible with Azure Machine Learning Endpoints for serverless inference deployment, enabling one-click deployment to Azure's managed inference infrastructure. The model can be registered in Azure ML Model Registry and deployed via Azure Endpoints with automatic scaling, monitoring, and API exposure. Azure integration handles model versioning, A/B testing, and traffic routing, with support for both real-time (synchronous) and batch inference endpoints.","intents":["Deploy segmentation model to Azure cloud without managing infrastructure","Enable auto-scaling inference endpoints that handle variable traffic","Integrate segmentation into Azure ML pipelines and workflows","Expose model via REST API for downstream applications"],"best_for":["Organizations standardized on Azure cloud platform","Teams requiring managed inference without DevOps overhead","Applications needing auto-scaling and high availability","Enterprises with Azure ML governance and monitoring requirements"],"limitations":["Azure-specific deployment — requires Azure subscription and account setup","Cold start latency for serverless endpoints — first request may incur 5-30s delay","Pricing based on compute hours and API calls — can be expensive for high-volume inference","Limited customization of inference environment — constrained to Azure-supported runtimes","Vendor lock-in — migrating to other cloud platforms requires re-deployment"],"requires":["Azure subscription with Machine Learning workspace","Azure CLI or Python SDK for deployment","Model registered in Azure ML Model Registry","Compute resources (CPU or GPU) allocated for endpoint"],"input_types":["image data (base64-encoded or URL-referenced)","batch of images for batch endpoints"],"output_types":["REST API response with segmentation predictions","JSON-formatted class indices and confidence scores"],"categories":["tool-use-integration","cloud-deployment"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"hf-model-nvidia--segformer-b4-finetuned-ade-512-512__cap_9","uri":"capability://memory.knowledge.arxiv.paper.reference.with.segformer.architecture.details","name":"arxiv-paper-reference-with-segformer-architecture-details","description":"Model is based on the SegFormer architecture published in arXiv paper 2105.15203 ('SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers'). The paper provides architectural specifications, training procedures, and benchmark results that enable reproducibility and understanding of design choices. Reference to the paper enables users to understand the hierarchical transformer encoder design, linear decoder rationale, and efficiency-accuracy tradeoffs that differentiate SegFormer from prior CNN-based segmentation approaches.","intents":["Understand the architectural design and rationale behind SegFormer's efficiency","Reproduce training procedures and hyperparameters from the original paper","Compare SegFormer's design choices against alternative segmentation architectures","Access detailed ablation studies and performance analysis from the paper"],"best_for":["Researchers studying transformer-based segmentation architectures","Teams implementing custom SegFormer variants or fine-tuning procedures","Developers evaluating architectural tradeoffs for their applications","Academic projects requiring reproducibility and paper citations"],"limitations":["Paper describes general SegFormer architecture — specific B4 fine-tuning details may not be fully documented","Benchmark results in paper may differ from fine-tuned model performance on ADE20K","Paper does not cover Azure deployment or Hugging Face integration specifics","Requires access to arXiv or academic paper repositories"],"requires":["Access to arXiv paper 2105.15203","Understanding of transformer architectures and semantic segmentation"],"input_types":["arXiv paper reference"],"output_types":["architectural specifications and design rationale","training procedures and hyperparameters","benchmark results and ablation studies"],"categories":["memory-knowledge","research-reference"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":42,"verified":false,"data_access_risk":"high","permissions":["PyTorch 1.9+ or TensorFlow 2.6+ (model available in both frameworks)","CUDA 11.0+ for GPU inference (CPU inference possible but 5-10x slower)","Transformers library 4.5.0+","Minimum 4GB VRAM for batch size 1 on GPU; 8GB+ recommended for batch processing","PIL/Pillow for image loading and preprocessing","Transformer encoder with 4-stage hierarchical output","Feature maps at 4×, 8×, 16×, 32× downsampling ratios","PyTorch or TensorFlow with tensor concatenation and linear layer support","ADE20K class mapping (available in Hugging Face model card or transformers library)","Knowledge of ADE20K taxonomy for interpreting class indices"],"failure_modes":["Fixed input resolution of 512×512 — images must be resized, potentially losing fine details or distorting aspect ratios","Trained exclusively on ADE20K (150 classes) — poor generalization to custom domains or novel object categories without fine-tuning","Transformer architecture requires full image context — cannot process streaming or partial image data efficiently","Inference latency ~200-400ms on GPU (varies by hardware) — not suitable for real-time applications requiring <30ms response","No built-in uncertainty quantification or confidence scores per pixel — difficult to identify low-confidence predictions","Linear decoder cannot learn complex spatial transformations — relies entirely on encoder quality","Upsampling from 32× downsampling stage may lose fine spatial details in small objects","No learnable skip connections or feature recalibration — fixed aggregation strategy","Sensitive to encoder feature quality — poor encoder performance directly impacts decoder output","Trained on ADE20K only — poor performance on out-of-distribution domains (e.g., medical imaging, satellite imagery, synthetic scenes)","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.4830652556382852,"quality":0.45,"ecosystem":0.5000000000000001,"match_graph":0.25,"freshness":0.75,"weights":{"adoption":0.35,"quality":0.2,"ecosystem":0.1,"match_graph":0.3,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-05-24T12:16:22.765Z","last_scraped_at":"2026-05-03T14:23:00.162Z","last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":104510,"model_likes":4}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=nvidia--segformer-b4-finetuned-ade-512-512","compare_url":"https://unfragile.ai/compare?artifact=nvidia--segformer-b4-finetuned-ade-512-512"}},"signature":"S3uxIU/+qCCVgTUDUFa790qIkGi6s7vWi/qy0p8WZtM/cccB6yFPoaLdElPZArvbjX8fBTYDbF6CXE/fM0DgBA==","signedAt":"2026-06-20T07:48:02.105Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/nvidia--segformer-b4-finetuned-ade-512-512","artifact":"https://unfragile.ai/nvidia--segformer-b4-finetuned-ade-512-512","verify":"https://unfragile.ai/api/v1/verify?slug=nvidia--segformer-b4-finetuned-ade-512-512","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}