AI Repositories

The open-source AI ecosystem — frameworks like LangChain and CrewAI, libraries, research implementations, awesome-lists, and the building blocks developers use to create AI applications.

100 repositories

12 categories

model-training (29)rag-knowledge (17)data-pipelines (14)frameworks-sdks (12)testing-quality (11)observability (5)app-builders (5)deployment-infra (5)documentation (5)voice-audio (5)automation (4)image-generation (4)

100 of 100

ChromaRepository80/100Open Source

Open-source embedding database — simple API, auto-embedding, runs locally or in the cloud.

·Ranked by freshness 90, quality 80

screenshot-to-codeRepository78/100Open Source

Convert screenshots and designs to code — HTML, React, Vue, Tailwind via GPT-4V or Claude.

·Ranked by freshness 90, quality 78

Vanna.aiRepository74/100Open Source

Natural language to SQL — ask your database questions in plain English. RAG-based, learns your schema.

·Ranked by freshness 90, ecosystem 75

GPQARepository64/100Open Source

Graduate-level expert QA — unsearchable questions in biology, physics, chemistry for deep reasoning.

10 capabilities·Ranked by quality 90, freshness 90

LangfuseRepository62/100Open Source

Open-source LLM observability — tracing, prompt management, evaluation, cost tracking, self-hosted.

15 capabilities·Ranked by quality 90, freshness 90

Dify Template GalleryRepository62/100Open Source

Visual LLM app builder with pre-built workflow templates.

14 capabilities·Ranked by quality 90, freshness 90

MLflowRepository61/100Open Source

Open-source ML lifecycle platform — experiment tracking, model registry, serving, LLM tracing.

14 capabilities·Ranked by quality 90, freshness 90

MeilisearchRepository61/100Open Source

Lightning-fast search engine with vector search.

13 capabilities·Ranked by quality 90, freshness 90

Determined AIRepository61/100Open Source

Deep learning training platform — distributed training, hyperparameter search, GPU scheduling.

14 capabilities·Ranked by quality 90, freshness 90

ClearMLRepository61/100Open Source

Open-source MLOps — experiment tracking, pipelines, data management, auto-logging, self-hosted.

14 capabilities·Ranked by quality 90, freshness 90

ArgillaRepository61/100Open Source

Open-source data curation for LLM fine-tuning and RLHF.

13 capabilities·Ranked by quality 90, freshness 90

WindmillRepository59/100Open Source

Developer platform for internal tools.

14 capabilities·Ranked by quality 90, freshness 90

PrivateGPTRepository59/100Open Source

Private document Q&A with local LLMs.

14 capabilities·Ranked by quality 90, freshness 90

Open WebUIRepository59/100Open Source

Self-hosted ChatGPT-like UI — supports Ollama/OpenAI, RAG, web search, multi-user, plugins.

16 capabilities·Ranked by quality 90, freshness 90

Nomic EmbedRepository59/100Open Source

Open-source embedding models with full transparency.

14 capabilities·Ranked by quality 90, freshness 90

Label StudioRepository59/100Open Source

Open-source multi-modal data labeling platform.

14 capabilities·Ranked by quality 90, freshness 90

Kokoro TTSRepository59/100Open Source

Lightweight 82M parameter open-source TTS with high-quality output.

10 capabilities·Ranked by quality 90, freshness 90

KestraRepository59/100Open Source

Unified orchestration with declarative YAML.

15 capabilities·Ranked by quality 90, freshness 90

HopsworksRepository59/100Open Source

Open-source ML platform with feature store and model registry.

13 capabilities·Ranked by quality 90, freshness 90

GPT4AllRepository59/100Open Source

Privacy-first local LLM ecosystem — desktop app, document Q&A, Python SDK, runs on CPU.

14 capabilities·Ranked by quality 90, freshness 90

Evidently AIRepository59/100Open Source

ML/LLM monitoring — data drift, model quality, 100+ metrics, dashboards, test suites.

13 capabilities·Ranked by quality 90, freshness 90

DoclingRepository59/100Open Source

IBM's document converter — PDFs, DOCX to structured markdown with OCR and table extraction.

13 capabilities·Ranked by quality 90, freshness 90

DiffusersRepository59/100Open Source

Hugging Face's diffusion model library — Stable Diffusion, Flux, ControlNet, LoRA, schedulers.

15 capabilities·Ranked by quality 90, freshness 90

CVATRepository59/100Open Source

Open-source computer vision annotation tool.

15 capabilities·Ranked by quality 90, freshness 90

Cursor RulesRepository59/100Open Source

Community .cursorrules collection — project-specific AI instructions for Cursor IDE.

13 capabilities·Ranked by quality 90, freshness 90

ComposioRepository59/100Open Source

250+ tool integrations for AI agents — GitHub, Slack, Gmail, Jira with auth handling.

13 capabilities·Ranked by quality 90, freshness 90

CLIPRepository59/100Open Source

OpenAI's vision-language model for zero-shot classification.

11 capabilities·Ranked by quality 90, freshness 90

AutoAWQRepository59/100Open Source

4-bit weight quantization for LLMs on consumer GPUs.

13 capabilities·Ranked by quality 90, freshness 90

Arize PhoenixRepository59/100Open Source

Open-source LLM observability — tracing, evaluation, OpenTelemetry, span analysis.

14 capabilities·Ranked by quality 90, freshness 90

Anthropic CookbookRepository59/100Open Source

Official Anthropic recipes for building with Claude.

14 capabilities·Ranked by quality 90, freshness 90

AirbyteRepository59/100Open Source

Open-source ELT platform with 300+ connectors.

13 capabilities·Ranked by quality 90, freshness 90

AgentaRepository59/100Open Source

Open-source LLMOps platform for prompt management and evaluation.

15 capabilities·Ranked by quality 90, freshness 90

YOLOv8Repository58/100Open Source

Real-time object detection, segmentation, and pose.

16 capabilities·Ranked by quality 90, freshness 90

WhisperRepository58/100Open Source

OpenAI's open-source speech recognition — 99 languages, translation, timestamps, runs locally.

12 capabilities·Ranked by quality 90, freshness 90

UnslothRepository58/100Open Source

2x faster LLM fine-tuning with 80% less memory — optimized QLoRA kernels for consumer GPUs.

16 capabilities·Ranked by quality 90, freshness 90

UltralyticsRepository58/100Open Source

Unified YOLO framework for detection and segmentation.

14 capabilities·Ranked by quality 90, freshness 90

TypesenseRepository58/100Open Source

Instant search engine with vector support.

14 capabilities·Ranked by quality 90, freshness 90

TRLRepository58/100Open Source

Reinforcement learning from human feedback — SFT, DPO, PPO trainers for LLM alignment.

15 capabilities·Ranked by quality 90, freshness 90

TransformersRepository58/100Open Source

Hugging Face's model library — thousands of pretrained transformers for NLP, vision, audio.

18 capabilities·Ranked by quality 90, freshness 90

torchtuneRepository58/100Open Source

PyTorch-native LLM fine-tuning library.

15 capabilities·Ranked by quality 90, freshness 90

SodaRepository58/100Open Source

Data quality checks with human-readable SodaCL language.

13 capabilities·Ranked by quality 90, freshness 90

SmolagentsRepository58/100Open Source

Hugging Face's lightweight agent framework — code-as-action, minimal abstraction, MCP support.

18 capabilities·Ranked by quality 90, freshness 90

sentence-transformersRepository58/100Open Source

Framework for sentence embeddings and semantic search.

14 capabilities·Ranked by quality 90, freshness 90

SemgrepRepository58/100Open Source

Static analysis — custom rules for bugs and security, 30+ languages, AI-powered triage.

15 capabilities·Ranked by quality 90, freshness 90

RebuffRepository58/100Open Source

Self-hardening prompt injection detector with multi-layer defense.

13 capabilities·Ranked by quality 90, freshness 90

RAGFlowRepository58/100Open Source

RAG engine for deep document understanding.

14 capabilities·Ranked by quality 90, freshness 90

pgvectorRepository58/100Open Source

Vector search for PostgreSQL — HNSW indexes, similarity queries in SQL, use existing Postgres.

13 capabilities·Ranked by quality 90, freshness 90

PEFTRepository58/100Open Source

Parameter-efficient fine-tuning — LoRA, QLoRA, adapter methods for LLMs on consumer GPUs.

15 capabilities·Ranked by quality 90, freshness 90

OpikRepository58/100Open Source

LLM evaluation and tracing platform — automated metrics, prompt management, CI/CD integration.

14 capabilities·Ranked by quality 90, freshness 90

OctoRepository58/100Open Source

Generalist robot policy model from Open X-Embodiment.

13 capabilities·Ranked by quality 90, freshness 90

MMDetectionRepository58/100Open Source

OpenMMLab detection toolbox with 300+ models.

14 capabilities·Ranked by quality 90, freshness 90

Mem0Repository58/100Open Source

Persistent memory layer for AI agents.

14 capabilities·Ranked by quality 90, freshness 90

MAP-NeoRepository58/100Open Source

Fully open bilingual model with transparent training.

11 capabilities·Ranked by quality 90, freshness 90

LocalAIRepository58/100Open Source

OpenAI-compatible local AI server — LLMs, images, speech, embeddings, no GPU required.

15 capabilities·Ranked by quality 90, freshness 90

llmcompressorRepository58/100Open Source

Toolkit for LLM quantization, pruning, and distillation.

16 capabilities·Ranked by quality 90, freshness 90

llama.cppRepository58/100Open Source

C/C++ LLM inference — GGUF quantization, GPU offloading, foundation for local AI tools.

15 capabilities·Ranked by quality 90, freshness 90

LibreChatRepository58/100Open Source

Open-source ChatGPT clone — multi-provider, plugins, file upload, self-hosted.

16 capabilities·Ranked by quality 90, freshness 90

InvokeAIRepository58/100Open Source

Professional open-source creative engine with node-based workflow editor.

13 capabilities·Ranked by quality 90, freshness 90

GraniteRepository58/100Open Source

IBM's enterprise-focused open foundation models.

12 capabilities·Ranked by quality 90, freshness 90

TrendRadarRepository58/100Open Source

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

12 capabilities·Ranked by freshness 90, adoption 89

ragflowRepository58/100Open Source

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

14 capabilities·Ranked by freshness 90, adoption 88

FooocusRepository58/100Open Source

Simplified Midjourney-like interface for local Stable Diffusion XL.

15 capabilities·Ranked by quality 90, freshness 90

FlaxRepository58/100Open Source

Neural network library for JAX with functional patterns.

13 capabilities·Ranked by quality 90, freshness 90

FastEmbedRepository58/100Open Source

Fast local embedding generation — ONNX Runtime, no GPU needed, text and image models.

13 capabilities·Ranked by quality 90, freshness 90

ExLlamaV2Repository58/100Open Source

Optimized quantized LLM inference for consumer GPUs — EXL2/GPTQ, flash attention, memory-efficient.

14 capabilities·Ranked by quality 90, freshness 90

ElementaryRepository58/100Open Source

Open-source dbt-native data observability and anomaly detection.

13 capabilities·Ranked by quality 90, freshness 90

EinopsRepository58/100Open Source

Readable tensor operations for all major frameworks.

12 capabilities·Ranked by quality 90, freshness 90

DVCRepository58/100Open Source

Git for data and ML — version large files, experiment tracking, pipeline DAGs, remote storage.

14 capabilities·Ranked by quality 90, freshness 90

DoccanoRepository58/100Open Source

Open-source text annotation for NLP tasks.

13 capabilities·Ranked by quality 90, freshness 90

Detectron2Repository58/100Open Source

Meta's modular object detection platform on PyTorch.

15 capabilities·Ranked by quality 90, freshness 90

CTranslate2Repository58/100Open Source

Fast transformer inference engine — INT8 quantization, C++ core, Whisper/Llama support.

13 capabilities·Ranked by quality 90, freshness 90

Crawl4AIRepository58/100Open Source

AI-optimized web crawler — clean markdown extraction, JS rendering, structured output for RAG.

20 capabilities·Ranked by quality 90, freshness 90

bitsandbytesRepository58/100Open Source

8-bit and 4-bit quantization enabling QLoRA fine-tuning.

14 capabilities·Ranked by quality 90, freshness 90

BarkRepository58/100Open Source

Open-source text-to-audio — speech, music, sound effects, 13+ languages, runs locally.

13 capabilities·Ranked by quality 90, freshness 90

BAMLRepository58/100Open Source

DSL for type-safe LLM functions — define schemas in .baml, get generated clients with testing.

14 capabilities·Ranked by quality 90, freshness 90

AxolotlRepository58/100Open Source

Streamlined LLM fine-tuning — YAML config, LoRA/QLoRA, multi-GPU, data preprocessing.

14 capabilities·Ranked by quality 90, freshness 90

AutoGPTQRepository58/100Open Source

GPTQ-based LLM quantization with fast CUDA inference.

12 capabilities·Ranked by quality 90, freshness 90

AudioCraftRepository58/100Open Source

Meta's library for music and audio generation.

13 capabilities·Ranked by quality 90, freshness 90

AlbumentationsRepository58/100Open Source

Fast image augmentation library with 70+ transforms.

12 capabilities·Ranked by quality 90, freshness 90

AgentScopeRepository58/100Open Source

Multi-agent platform with distributed deployment.

16 capabilities·Ranked by quality 90, freshness 90

ActivepiecesRepository58/100Open Source

Open-source no-code automation tool.

16 capabilities·Ranked by quality 90, freshness 90

MarkerRepository57/100Open Source

PDF to Markdown converter with deep learning.

14 capabilities·Ranked by quality 90, freshness 90

kubectl-aiRepository57/100Open Source

Generate Kubernetes manifests with AI.

13 capabilities·Ranked by quality 90, freshness 90

k6Repository57/100Open Source

Developer-centric load testing tool by Grafana Labs.

15 capabilities·Ranked by quality 90, freshness 90

Chatbot UIRepository57/100Open Source

Open-source multi-provider ChatGPT UI template.

13 capabilities·Ranked by quality 90, freshness 90

Chainlit CookbookRepository57/100Open Source

Chainlit conversational AI interface templates.

16 capabilities·Ranked by quality 90, freshness 90

BetterChatGPTRepository57/100Open Source

Enhanced ChatGPT UI with folders, prompts, and cost tracking.

14 capabilities·Ranked by quality 90, freshness 90

SingerRepository56/100Open Source

Open-source standard for data extraction taps and targets.

11 capabilities·Ranked by quality 90, freshness 90

SearXNGRepository56/100Open Source

Privacy-respecting metasearch — 70+ engines, no tracking, self-hosted, JSON API for AI agents.

14 capabilities·Ranked by quality 90, freshness 90

PromptimizeRepository56/100Open Source

Prompt optimization library with systematic variation testing.

13 capabilities·Ranked by quality 90, freshness 90

PresidioRepository56/100Open Source

Microsoft's PII detection and anonymization SDK.

13 capabilities·Ranked by quality 90, freshness 90

PolarsRepository56/100Open Source

Rust-powered DataFrame library 10-100x faster than pandas.

15 capabilities·Ranked by quality 90, freshness 90

Piper TTSRepository56/100Open Source

Fast local neural TTS optimized for Raspberry Pi and edge devices.

13 capabilities·Ranked by quality 90, freshness 90

NLTKRepository56/100Open Source

Comprehensive NLP toolkit for education and research.

13 capabilities·Ranked by quality 90, freshness 90

MeltanoRepository56/100Open Source

Open-source DataOps platform built on Singer and dbt.

13 capabilities·Ranked by quality 90, freshness 90

Mage AIRepository56/100Open Source

Data pipeline tool with AI code generation.

14 capabilities·Ranked by quality 90, freshness 90

IbisRepository56/100Open Source

Portable Python dataframe API across 20+ backends.

16 capabilities·Ranked by quality 90, freshness 90

FlairRepository56/100Open Source

PyTorch NLP framework with contextual embeddings.

14 capabilities·Ranked by quality 90, freshness 90

FeastRepository56/100Open Source

Open-source ML feature store for training and serving.

13 capabilities·Ranked by quality 90, freshness 90

100

DuckDBRepository56/100Open Source

In-process SQL analytics engine for local data processing.

15 capabilities·Ranked by quality 90, freshness 90

What are AI Repositories?

Open-source AI repositories are the building blocks of the AI ecosystem. They include frameworks (LangChain, Transformers), tools (Ollama, vLLM), research implementations, and community projects. GitHub is the primary host, with repositories ranging from production-ready libraries to cutting-edge research code.

How to Choose

Beyond star count, evaluate: maintenance activity (last commit date, PR response time), documentation quality, test coverage, and community health (Discord/issues responsiveness). For production use, check the release cadence and breaking change history. Star count indicates popularity, not quality.

Key Capabilities to Evaluate

•Source code access — full transparency into how the tool works

•Self-hosting — run entirely on your infrastructure

•Customization — fork and modify for your specific needs

•Community contributions — PRs, issues, and discussions from other developers

•Integration flexibility — compose with other open-source tools

•License clarity — clear terms for commercial use

Common Patterns

Library/Package

Install as a dependency (npm, pip). The most common pattern — import and use in your code.

Framework

Provides the application structure — you write code within its patterns. More opinionated, more features.

Self-Contained Tool

Clone and run. Complete application with its own UI, API, and storage.

Research Implementation

Paper companion code. Often requires adaptation for production use.

What to Watch Out For

⚠Abandoned repos — high stars but no recent commits may indicate abandoned projects

⚠License traps — some popular repos use restrictive licenses (AGPL, non-commercial)

⚠Undocumented breaking changes — fast-moving repos may break your code between versions

⚠Security vulnerabilities — open-source doesn't mean audited; check for known CVEs

⚠Dependency bloat — some repos pull in hundreds of transitive dependencies

Top Capabilities

Browse all →

code explanation and documentation generation10 artifacts

Analyzes selected code or entire files and generates natural language explanations of what the code does, how it works, and why certain patterns were chosen. The feature can produce documentation in multiple formats (docstrings, comments, markdown) and supports various documentation styles (JSDoc, Sphinx, etc.). Developers can request explanations at different levels of detail (high-level overview, line-by-line breakdown, architectural context) through the chat interface, with responses appearing as formatted text or code comments.

ChatGPT AIAI Pundit Magic - Design to Code | Figma to CodeCodeGPT: write and improve code using AI

context-aware code completion3 artifacts

Cody utilizes a context-aware engine that analyzes the current file and project structure to provide relevant code completions. It integrates with the Visual Studio Code API to access the Abstract Syntax Tree (AST) of the code, allowing it to suggest completions that are semantically relevant to the context, rather than relying solely on keyword matching. This approach ensures that the suggestions are not only syntactically correct but also contextually appropriate, enhancing developer productivity.

SupermavenCline 中文版Cody

natural-language-to-full-stack-application-generation2 artifacts

Converts natural language prompts into executable full-stack web applications by invoking an AI agent that generates React/Next.js frontend code, Node.js backend logic, and database schemas. The agent runs code in-browser via WebContainers to validate syntax and functionality before deployment, iterating on the generated code based on execution feedback. Token consumption scales with project complexity (larger codebases consume more tokens per iteration), and the agent supports design system imports from Figma and GitHub to accelerate UI generation.

LovableBolt.new

model size selection with speed-accuracy tradeoffs across 6 variants2 artifacts

Provides six model variants (tiny, base, small, medium, large, turbo) with parameter counts ranging from 39M to 1550M, enabling developers to choose optimal speed-accuracy tradeoffs. Tiny model runs at ~10x speed with 1GB VRAM; large model runs at 1x speed with 10GB VRAM. English-only variants (tiny.en, base.en, small.en) provide higher English accuracy by removing multilingual capacity. Turbo model (809M params) offers 8x speedup over large with minimal accuracy loss but lacks translation support.

WhisperWhisper CLI

direct speech-to-english translation without intermediate transcription2 artifacts

Translates non-English speech directly to English text by using a task-specific token in the TextDecoder that signals translation mode, bypassing the need for intermediate transcription-then-translation pipelines. The AudioEncoder processes mel spectrograms identically to transcription, but the decoder generates English tokens directly from audio embeddings, reducing latency and error propagation compared to cascaded systems.

WhisperWhisper CLI

multilingual speech-to-text transcription with language-agnostic encoder2 artifacts

Transcribes audio in 98 languages to text in the original language using a unified Transformer sequence-to-sequence architecture with a shared AudioEncoder that processes mel spectrograms into language-agnostic embeddings, then a TextDecoder that generates tokens autoregressively. The system handles variable-length audio by padding or trimming to 30-second segments and uses task-specific tokens to signal transcription mode, enabling a single model to handle multiple languages without language-specific branches.

WhisperWhisper CLI

automatic language identification from audio with 98-language support2 artifacts

Detects the spoken language in audio by processing mel spectrograms through the AudioEncoder and using a language classification head that outputs probability distributions over 98 supported languages. The model leverages 680K hours of multilingual training data to recognize language characteristics from acoustic features alone, without requiring transcription. Language detection occurs as a preliminary step in the transcription pipeline and can be called independently via the language detection task token.

Whisper Large v3Whisper CLI

self-hosted-deployment-with-docker2 artifacts

W&B Personal tier (free) and Enterprise tier support self-hosted deployment via Docker, enabling on-premise installation for teams with data residency or security requirements. Self-hosted instances run independently from W&B cloud, with optional integration to W&B cloud for cross-instance features. Supports custom domain configuration, HTTPS, and integration with corporate identity providers (LDAP, SAML, OAuth).

Weights & BiasesWeights & Biases API

Browse Other Types

Agents

Autonomous AI systems that act on your behalf

Models

Foundation models, fine-tunes, and specialized AI models

MCP Servers

Model Context Protocol tools and integrations

APIs

Programmatic endpoints for AI capabilities

Extensions

Browser and IDE extensions powered by AI

Workflows

Automation sequences and AI pipelines

View all 19 types →

Frequently Asked Questions

How do I evaluate an open-source AI project?

Look beyond stars: check last commit date, open issue count vs. closed ratio, release frequency, documentation quality, test coverage, and license terms. A repo with 500 stars and weekly commits is often more reliable than one with 5000 stars and no commits in 6 months.

Search the match graph →Submit an artifact