{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"awesome-github","slug":"github","name":"Github","type":"repo","url":"https://github.com/allenai/olmocr","page_url":"https://unfragile.ai/github","categories":["automation"],"tags":[],"pricing":{"model":"open_source","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"awesome-github__cap_0","uri":"capability://image.visual.distributed.pdf.to.markdown.ocr.pipeline.with.work.queue.orchestration","name":"distributed pdf-to-markdown ocr pipeline with work queue orchestration","description":"Converts PDF, PNG, and JPEG documents into clean markdown and structured text using a distributed worker architecture backed by S3 or local file-based work queues. The pipeline orchestrates page-level processing through a queue system that coordinates multiple worker processes, each invoking a fine-tuned 7B vision-language model (olmOCR-2-7B based on Qwen2.5-VL) via vLLM server instances. Workers pull tasks from the queue, process pages with rotation correction and layout analysis, and write results back to persistent storage, enabling horizontal scaling across machines.","intents":["Process millions of PDF documents at scale with sub-$200 per million page cost","Convert documents to markdown while preserving reading order, equations, and table structure","Distribute OCR workload across multiple machines using S3 or local storage coordination","Handle misoriented pages automatically with rotation detection and correction"],"best_for":["teams processing document corpora at scale (1M+ pages)","organizations needing cost-efficient OCR with structured markdown output","builders integrating OCR into data pipelines with distributed infrastructure"],"limitations":["Requires vLLM server deployment for inference — adds operational complexity vs single-process solutions","Work queue coordination on S3 has eventual consistency semantics — may cause duplicate processing under high concurrency","Model is 7B parameters — requires GPU with 16GB+ VRAM for reasonable throughput (FP8 quantization)","No built-in retry logic for failed pages — requires external orchestration for fault tolerance"],"requires":["Python 3.9+","vLLM server running with olmOCR-2-7B-1025-FP8 model loaded","S3 credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or local filesystem with shared mount","GPU with 16GB+ VRAM for inference server","PyPDF2 or equivalent for PDF parsing"],"input_types":["PDF files (single or multi-page)","PNG images","JPEG images"],"output_types":["Markdown text with LaTeX equations","Dolma format (structured JSON with metadata)","HTML tables extracted from documents"],"categories":["image-visual","automation-workflow","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_1","uri":"capability://image.visual.page.level.rotation.detection.and.correction.with.vlm.inference","name":"page-level rotation detection and correction with vlm inference","description":"Automatically detects and corrects page rotation by invoking the vision-language model on each page image to determine correct orientation before full OCR processing. The system analyzes visual cues (text direction, layout coherence) through the VLM to identify if a page is rotated 0°, 90°, 180°, or 270°, then applies geometric transformations to normalize orientation before downstream text extraction. This pre-processing step improves downstream OCR accuracy by ensuring consistent text direction.","intents":["Handle scanned documents with mixed or incorrect page orientations","Automatically fix rotated pages without manual intervention","Improve OCR accuracy by normalizing page orientation before text extraction"],"best_for":["processing legacy scanned document collections with inconsistent orientations","automated document ingestion pipelines requiring robust handling of malformed inputs"],"limitations":["Adds latency of one additional VLM inference per page (~500ms-1s per page)","May fail on pages with minimal text or highly stylized layouts where orientation is ambiguous","Requires VLM server to be running — cannot operate in offline mode"],"requires":["vLLM server with olmOCR model loaded","PIL/Pillow for image rotation operations","Page images in PNG or JPEG format"],"input_types":["PNG images","JPEG images"],"output_types":["Rotated PNG/JPEG images (0°, 90°, 180°, 270° corrected)","Rotation metadata (detected angle)"],"categories":["image-visual","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_10","uri":"capability://data.processing.analysis.data.augmentation.and.filtering.for.training.robustness","name":"data augmentation and filtering for training robustness","description":"Applies data augmentation techniques (rotation, scaling, noise injection, color jittering) to training images and filters low-quality training examples based on heuristics (image blur, text clarity, layout complexity). The augmentation pipeline increases training data diversity, improving model robustness to document variations. Filtering removes corrupted or low-quality examples that would degrade training, focusing compute on high-quality data.","intents":["Increase training data diversity without collecting more documents","Improve model robustness to document variations (rotation, scaling, noise)","Remove low-quality training examples that degrade model performance"],"best_for":["training with limited document collections (augmentation increases effective dataset size)","improving model robustness to real-world document variations","filtering noisy or corrupted training data"],"limitations":["Aggressive augmentation can introduce unrealistic variations — may hurt generalization","Filtering heuristics are dataset-dependent — thresholds may need tuning for different document types","Augmentation adds computational overhead during training — may slow training by 10-20%","No semantic-aware augmentation — cannot augment text content or layout structure"],"requires":["Python 3.9+ with torchvision or albumentations for augmentation","PIL/Pillow for image operations","Augmentation configuration (rotation range, noise level, etc.)","Filtering thresholds (blur threshold, text clarity score, etc.)"],"input_types":["Training images (PNG/JPEG)","Ground truth annotations (markdown)"],"output_types":["Augmented training images","Filtered training dataset (low-quality examples removed)","Augmentation statistics (number of variations per image)"],"categories":["data-processing-analysis","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_11","uri":"capability://automation.workflow.multi.ocr.comparison.framework.for.competitive.benchmarking","name":"multi-ocr comparison framework for competitive benchmarking","description":"Provides runners and evaluation harnesses for comparing olmOCR against competing OCR systems (Tesseract, NanoNets, Google Vision, etc.) on standardized benchmarks. The framework converts outputs from different OCR systems to a common format, applies the same evaluation metrics, and generates comparison reports. This enables fair comparison across systems with different output formats and capabilities.","intents":["Compare olmOCR performance against competing OCR solutions","Evaluate trade-offs between cost, speed, and accuracy across OCR systems","Generate competitive analysis reports for stakeholder decision-making"],"best_for":["teams evaluating OCR solutions before deployment","researchers comparing OCR approaches on standardized benchmarks","organizations making build-vs-buy decisions for OCR infrastructure"],"limitations":["Requires API keys or installations for competing systems — adds setup complexity","Output format conversion may lose information or introduce artifacts — affects fair comparison","Some systems have rate limits or costs — benchmarking large datasets may be expensive","Metrics may not be equally meaningful for all systems (e.g., LaTeX accuracy for systems that don't generate LaTeX)"],"requires":["Python 3.9+ with runner implementations for each OCR system","API keys for cloud OCR services (Google Vision, AWS Textract, etc.)","Local installations for open-source systems (Tesseract, etc.)","Output format converters for each system"],"input_types":["Document images (PNG/JPEG)","OCR system configurations (API keys, model versions, etc.)"],"output_types":["Comparison matrices (accuracy, speed, cost)","Detailed error analysis by system","Benchmark reports (PDF or JSON)"],"categories":["automation-workflow","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_12","uri":"capability://data.processing.analysis.dolma.format.output.generation.with.metadata.preservation","name":"dolma format output generation with metadata preservation","description":"Generates OCR output in Dolma format (structured JSON with document metadata, page-level information, and extracted text), enabling integration with downstream document processing pipelines and training data generation. The format preserves metadata including page numbers, source document paths, processing timestamps, and quality scores. This structured output enables filtering, sorting, and analysis of OCR results at scale.","intents":["Generate structured OCR output compatible with document processing pipelines","Preserve metadata for tracking document provenance and processing quality","Enable filtering and analysis of OCR results based on quality metrics"],"best_for":["teams building document processing pipelines that consume Dolma format","organizations tracking document processing quality and provenance","builders generating training data for downstream models"],"limitations":["Dolma format adds overhead compared to plain text output — larger file sizes","Requires downstream systems to support Dolma format — limits interoperability","Metadata preservation requires tracking throughout pipeline — adds complexity","No standardized schema for all metadata types — may require custom extensions"],"requires":["Python 3.9+ with JSON serialization","Dolma format specification and schema","Metadata collection throughout OCR pipeline"],"input_types":["OCR results (text, LaTeX, HTML)","Processing metadata (timestamps, quality scores, page numbers)"],"output_types":["Dolma format JSON files","Metadata indexes (for filtering and sorting)"],"categories":["data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_2","uri":"capability://data.processing.analysis.multi.column.layout.analysis.and.reading.order.reconstruction","name":"multi-column layout analysis and reading order reconstruction","description":"Analyzes document page layouts to identify multi-column regions and reconstructs natural reading order by processing spatial coordinates of text blocks extracted by the VLM. The system groups text elements by column position, sorts them top-to-bottom within columns, then merges columns left-to-right to produce markdown output that follows the intended document flow. This capability handles complex layouts including figures, insets, and mixed single/multi-column pages.","intents":["Convert multi-column documents (academic papers, newspapers, magazines) to single-column markdown","Preserve logical reading order when extracting text from complex layouts","Handle documents with mixed layout regions (some single-column, some multi-column)"],"best_for":["processing academic papers and research documents with two-column layouts","converting magazine and newspaper archives to readable markdown","building document understanding systems that require logical text flow"],"limitations":["Relies on VLM-provided bounding box accuracy — errors in spatial coordinates cascade to reading order errors","May struggle with documents having irregular column widths or overlapping text regions","No explicit handling of text spanning multiple columns (e.g., titles, captions) — may duplicate or misplace such text","Requires VLM to output spatial coordinates — not all VLM outputs include this metadata"],"requires":["VLM output with bounding box coordinates for text elements","Python 3.9+ with numpy for spatial analysis","Page dimensions (width, height) for coordinate normalization"],"input_types":["VLM-extracted text with bounding box coordinates","Page layout metadata"],"output_types":["Markdown text with reconstructed reading order","Column segmentation metadata"],"categories":["data-processing-analysis","image-visual"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_3","uri":"capability://image.visual.equation.and.table.extraction.with.latex.and.html.markdown.formatting","name":"equation and table extraction with latex and html/markdown formatting","description":"Extracts mathematical equations and tables from document pages and formats them as LaTeX (for equations) or HTML/Markdown (for tables) within the output markdown. The VLM recognizes equation regions and table structures, then generates appropriate markup that preserves mathematical notation and tabular relationships. Equations are rendered as inline or block LaTeX, while tables are converted to HTML or Markdown table syntax, maintaining semantic structure for downstream processing.","intents":["Preserve mathematical equations in documents as LaTeX for re-rendering or processing","Extract tables with structure intact (rows, columns, headers) rather than flattened text","Generate markdown that can be directly used in documentation or publishing workflows"],"best_for":["processing academic and scientific documents with heavy mathematical content","converting technical documentation with structured tables","building datasets for training math-aware OCR or document understanding models"],"limitations":["LaTeX generation quality depends on VLM's mathematical notation understanding — complex or handwritten equations may be misrecognized","Table extraction assumes clear grid structure — irregular tables with merged cells or complex nesting may be incorrectly parsed","No validation of generated LaTeX — malformed equations may not render correctly without post-processing","Handwritten equations are recognized but may have lower accuracy than printed text"],"requires":["VLM trained on mathematical and tabular content (olmOCR-2-7B includes this training)","Python 3.9+ with regex or parsing libraries for LaTeX/HTML generation","KaTeX or similar for equation validation (optional, for benchmarking)"],"input_types":["PDF pages containing equations and tables","PNG/JPEG images with mathematical or tabular content"],"output_types":["LaTeX strings (inline and block)","HTML table markup","Markdown table syntax","Markdown with embedded LaTeX and HTML"],"categories":["image-visual","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_4","uri":"capability://data.processing.analysis.header.and.footer.automatic.removal.with.content.classification","name":"header and footer automatic removal with content classification","description":"Automatically detects and removes headers and footers from document pages by classifying text regions as header/footer/body content using spatial position heuristics and VLM-based content analysis. The system identifies text appearing consistently at the top or bottom of pages (page numbers, running titles, repeated metadata) and excludes it from the final markdown output. This improves readability by eliminating repetitive non-content text.","intents":["Remove page numbers and running headers from extracted text","Eliminate repeated metadata (document titles, dates) that appear on every page","Generate clean markdown without boilerplate content"],"best_for":["processing multi-page documents with consistent headers/footers","building clean training datasets for document understanding models","converting scanned books and academic papers to readable markdown"],"limitations":["Heuristic-based detection may fail on documents with non-standard header/footer placement","Cannot distinguish between legitimate content and headers if they appear in header/footer regions (e.g., a section title that happens to be at page top)","Requires consistent header/footer positioning across pages — documents with variable layouts may have inconsistent removal","No semantic understanding of content importance — may remove legitimate content if positioned like headers"],"requires":["Page layout metadata (page height, margins)","Text bounding box coordinates from VLM","Heuristic thresholds for header/footer region definition (configurable)"],"input_types":["VLM-extracted text with spatial coordinates","Page layout metadata"],"output_types":["Filtered markdown without headers/footers","Metadata indicating removed regions"],"categories":["data-processing-analysis","image-visual"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_5","uri":"capability://image.visual.pdf.rendering.and.page.to.image.conversion.with.quality.control","name":"pdf rendering and page-to-image conversion with quality control","description":"Converts PDF pages to high-quality PNG or JPEG images at configurable DPI (typically 150-300 DPI) using PyPDF2 or similar libraries, with optional filtering to skip blank or low-quality pages. The system renders each page as a raster image suitable for VLM inference, applying quality checks to detect and optionally skip pages that are blank, corrupted, or contain only images without text. This preprocessing ensures only processable pages are sent to the VLM, reducing wasted inference compute.","intents":["Convert PDF pages to images for VLM processing","Skip blank or corrupted pages to reduce processing costs","Control image quality (DPI, format) for optimal VLM inference accuracy"],"best_for":["preprocessing PDF documents before VLM-based OCR","filtering document collections to remove non-text pages","optimizing inference costs by skipping unprocessable pages"],"limitations":["High DPI rendering (300+ DPI) is memory-intensive and slow — may require batching or streaming","Blank page detection is heuristic-based (e.g., pixel variance threshold) — may incorrectly classify pages with sparse content","PDF rendering quality depends on PDF structure — some PDFs with embedded fonts or complex graphics may render poorly","No OCR-based validation — cannot detect pages with only images and no text without additional processing"],"requires":["PyPDF2 or pdfplumber for PDF parsing","Pillow (PIL) for image operations","Ghostscript or similar for high-quality PDF rendering (optional, for better quality)","Sufficient disk space for intermediate images (typically 1-5MB per page at 150 DPI)"],"input_types":["PDF files"],"output_types":["PNG images (lossless)","JPEG images (lossy, smaller file size)","Image metadata (page number, dimensions, quality score)"],"categories":["image-visual","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_6","uri":"capability://data.processing.analysis.comprehensive.ocr.benchmarking.with.synthetic.test.case.generation","name":"comprehensive ocr benchmarking with synthetic test case generation","description":"Provides a benchmarking framework (olmOCR-Bench) that evaluates OCR quality across 7,000+ test cases covering 1,400 documents, with automated synthetic test case generation for equations (via KaTeX rendering), tables (via HTML rendering), and handwriting. The system compares olmOCR output against ground truth using metrics like character error rate (CER), equation accuracy, and table structure preservation. Test cases are mined from real documents and augmented with synthetic variations to ensure comprehensive coverage of edge cases.","intents":["Evaluate OCR model quality across diverse document types and content","Generate synthetic training data for equations and tables at scale","Compare olmOCR against competing OCR systems (Tesseract, NanoNets, etc.) on standardized benchmarks","Track model improvements across versions with consistent evaluation metrics"],"best_for":["researchers developing OCR models who need comprehensive evaluation","teams comparing OCR solutions before deployment","builders generating synthetic training data for document understanding models"],"limitations":["Benchmark is specific to olmOCR's output format (markdown with LaTeX/HTML) — not directly comparable to OCR systems with different output formats","Synthetic test cases may not cover all real-world document variations — gap between synthetic and real performance","Equation and table test generation requires valid KaTeX/HTML — malformed inputs are skipped","Benchmarking requires ground truth annotations — expensive to extend to new document types"],"requires":["Python 3.9+ with pytest for test execution","KaTeX for equation rendering and validation","HTML/CSS rendering engine for table generation","1,400 reference documents with ground truth annotations (provided in repo)"],"input_types":["PDF documents","OCR output (markdown with LaTeX/HTML)","Ground truth annotations (JSON format)"],"output_types":["Benchmark scores (overall, by category: equations, tables, text, handwriting)","Detailed error reports (CER, equation accuracy, table F1 score)","Comparison matrices (olmOCR vs competing systems)"],"categories":["data-processing-analysis","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_7","uri":"capability://code.generation.editing.supervised.fine.tuning.with.document.specific.training.data","name":"supervised fine-tuning with document-specific training data","description":"Implements supervised fine-tuning (SFT) of the base Qwen2.5-VL model on document OCR tasks using training data generated from the benchmarking system and augmented with synthetic variations. The training pipeline loads document images and ground truth markdown outputs, applies data augmentation (rotation, scaling, noise), and optimizes the model using standard cross-entropy loss on token prediction. Fine-tuning is performed on Beaker distributed training infrastructure, enabling multi-GPU training across multiple machines.","intents":["Adapt the base VLM to document OCR tasks with domain-specific training","Improve model accuracy on specific document types or layouts","Generate training data automatically from benchmarking results"],"best_for":["teams with large document collections wanting to fine-tune models for their specific domain","researchers improving OCR model quality through iterative training","organizations deploying custom OCR models on proprietary document types"],"limitations":["Requires significant GPU resources (multi-GPU training) — expensive to run locally","Fine-tuning on small datasets (<10K examples) may overfit — requires careful regularization","Training data quality directly impacts model quality — poor ground truth annotations degrade performance","Fine-tuned models are specific to the training distribution — may not generalize to out-of-distribution documents"],"requires":["Python 3.9+ with PyTorch and transformers library","Multi-GPU setup (8x A100 or equivalent recommended) or Beaker cluster access","Training data in Dolma format with image-markdown pairs","Qwen2.5-VL base model weights (HuggingFace model hub)"],"input_types":["Document images (PNG/JPEG)","Ground truth markdown with LaTeX/HTML","Training configuration (learning rate, batch size, epochs)"],"output_types":["Fine-tuned model weights (HuggingFace format)","Training logs and metrics (loss, validation accuracy)","Model checkpoints at regular intervals"],"categories":["code-generation-editing","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_8","uri":"capability://planning.reasoning.reinforcement.learning.optimization.with.grpo.for.ocr.quality","name":"reinforcement learning optimization with grpo for ocr quality","description":"Implements Group Relative Policy Optimization (GRPO) reinforcement learning to optimize the fine-tuned model for OCR quality metrics (character error rate, equation accuracy, table F1 score) beyond supervised fine-tuning. The system uses the benchmarking framework to generate reward signals based on OCR output quality, then applies GRPO to adjust model weights to maximize these rewards. This enables the model to learn from its own errors and improve on metrics that matter for downstream applications.","intents":["Optimize OCR model for specific quality metrics (CER, equation accuracy, table F1)","Improve model performance beyond what supervised fine-tuning alone can achieve","Align model behavior with application-specific quality requirements"],"best_for":["teams with well-defined OCR quality metrics and large document collections","researchers exploring reinforcement learning for document understanding","organizations optimizing models for specific downstream tasks (e.g., table extraction)"],"limitations":["GRPO training is computationally expensive — requires significant GPU resources and long training times","Reward signal design is critical — poorly designed rewards can lead to gaming or unintended behaviors","Training instability is common in RL — requires careful hyperparameter tuning and monitoring","Improvements from GRPO are often incremental (1-2% over SFT) — may not justify computational cost for all use cases"],"requires":["Python 3.9+ with PyTorch and custom GRPO implementation","Multi-GPU setup (8x A100 or equivalent) for reasonable training time","Benchmarking framework for reward signal generation","Fine-tuned model from supervised fine-tuning as initialization"],"input_types":["Document images (PNG/JPEG)","Reward metrics (CER, equation accuracy, table F1)","Training configuration (learning rate, reward scaling, GRPO hyperparameters)"],"output_types":["RL-optimized model weights","Training logs with reward signals and policy gradients","Evaluation metrics showing improvement over SFT baseline"],"categories":["planning-reasoning","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"awesome-github__cap_9","uri":"capability://automation.workflow.distributed.training.orchestration.on.beaker.infrastructure","name":"distributed training orchestration on beaker infrastructure","description":"Orchestrates distributed model training across Beaker clusters, managing multi-GPU training jobs, data distribution, and checkpoint synchronization. The system submits training jobs to Beaker with specified resource requirements (GPU count, memory), distributes training data across workers, and coordinates gradient synchronization using PyTorch's DistributedDataParallel. This enables efficient scaling of training from single-GPU to multi-GPU setups without code changes.","intents":["Scale model training across multiple GPUs and machines","Manage training infrastructure without manual cluster setup","Coordinate distributed training jobs with automatic resource allocation"],"best_for":["teams with access to Beaker infrastructure (Allen AI internal or partners)","organizations training large models requiring multi-GPU setups","researchers running multiple training experiments in parallel"],"limitations":["Requires Beaker cluster access — not available for local development or non-Beaker environments","Data distribution overhead can be significant for small datasets — may not be cost-effective for <100K examples","Beaker job submission has latency (minutes to hours) — not suitable for interactive development","Requires careful configuration of distributed training hyperparameters (learning rate scaling, batch size per GPU)"],"requires":["Beaker CLI and credentials configured","Docker image with training dependencies (PyTorch, transformers, etc.)","Training data in Dolma format accessible to Beaker workers","Training configuration specifying GPU count, memory, and job duration"],"input_types":["Training configuration (YAML or JSON)","Docker image specification","Training data paths (S3 or Beaker storage)"],"output_types":["Model checkpoints (saved to Beaker storage)","Training logs and metrics","Job status and resource utilization reports"],"categories":["automation-workflow","tool-use-integration"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":25,"verified":false,"data_access_risk":"high","permissions":["Python 3.9+","vLLM server running with olmOCR-2-7B-1025-FP8 model loaded","S3 credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or local filesystem with shared mount","GPU with 16GB+ VRAM for inference server","PyPDF2 or equivalent for PDF parsing","vLLM server with olmOCR model loaded","PIL/Pillow for image rotation operations","Page images in PNG or JPEG format","Python 3.9+ with torchvision or albumentations for augmentation","PIL/Pillow for image operations"],"failure_modes":["Requires vLLM server deployment for inference — adds operational complexity vs single-process solutions","Work queue coordination on S3 has eventual consistency semantics — may cause duplicate processing under high concurrency","Model is 7B parameters — requires GPU with 16GB+ VRAM for reasonable throughput (FP8 quantization)","No built-in retry logic for failed pages — requires external orchestration for fault tolerance","Adds latency of one additional VLM inference per page (~500ms-1s per page)","May fail on pages with minimal text or highly stylized layouts where orientation is ambiguous","Requires VLM server to be running — cannot operate in offline mode","Aggressive augmentation can introduce unrealistic variations — may hurt generalization","Filtering heuristics are dataset-dependent — thresholds may need tuning for different document types","Augmentation adds computational overhead during training — may slow training by 10-20%","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.05,"quality":0.35,"ecosystem":0.39999999999999997,"match_graph":0.25,"freshness":0.52,"weights":{"adoption":0.3,"quality":0.2,"ecosystem":0.15,"match_graph":0.3,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-06-17T09:51:03.040Z","last_scraped_at":"2026-05-03T14:00:25.471Z","last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=github","compare_url":"https://unfragile.ai/compare?artifact=github"}},"signature":"YPXF1Kth6OI+mPQFgMYcN7iK7Io61PUaK4IMxNeH6WmnGMaMiNb6dCHwD/NwQ5dMv98WYPpQQCJT1UDAZ8rHDA==","signedAt":"2026-06-20T22:17:22.297Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/github","artifact":"https://unfragile.ai/github","verify":"https://unfragile.ai/api/v1/verify?slug=github","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}