STORM
ProductAn LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. [#opensource](https://github.com/stanford-oval/storm/)
Capabilities12 decomposed
multi-turn topic research with iterative query refinement
Medium confidenceSTORM orchestrates sequential LLM-driven research cycles where an agent formulates search queries, retrieves relevant documents, and iteratively refines its understanding of a topic. The system maintains a research context that evolves across turns, allowing the LLM to identify knowledge gaps and generate follow-up queries that progressively deepen coverage. This differs from single-pass retrieval by implementing a planning-reasoning loop that decomposes complex topics into sub-questions and validates coverage before report generation.
Implements a multi-turn research loop where the LLM explicitly reasons about coverage gaps and generates follow-up queries, rather than treating search as a static retrieval step. The system maintains evolving research state across turns and uses LLM-driven decomposition to break topics into researchable sub-questions.
More thorough than single-pass RAG systems because it actively identifies and fills knowledge gaps through iterative query refinement, rather than retrieving a fixed set of documents once.
perspective-aware outline generation with multi-viewpoint synthesis
Medium confidenceSTORM generates structured outlines by explicitly modeling multiple perspectives on a topic, querying sources for each viewpoint, and synthesizing them into a hierarchical outline. The system uses LLM-driven perspective identification to determine relevant viewpoints (e.g., technical, business, ethical angles), retrieves information for each perspective independently, and then merges them into a unified outline structure. This approach ensures balanced coverage and explicit representation of different stakeholder views rather than a single homogenized narrative.
Explicitly decomposes topics into multiple perspectives and researches each independently before merging, rather than treating all sources as a single undifferentiated corpus. This ensures systematic coverage of different stakeholder viewpoints and makes perspective diversity a first-class concern in the outline structure.
Produces more balanced and comprehensive outlines than single-perspective systems because it actively identifies and researches distinct viewpoints, ensuring no major stakeholder perspective is overlooked.
research execution with configurable llm backends and model switching
Medium confidenceSTORM abstracts over multiple LLM providers (OpenAI, Anthropic, Ollama, etc.) and enables switching between models without changing research logic. The system supports configurable model selection for different research phases (e.g., using a cheaper model for query generation and a more capable model for synthesis). Model-specific parameters (temperature, max tokens, etc.) are configurable per phase, enabling fine-tuning of research behavior.
Abstracts over multiple LLM providers with pluggable backends, enabling model switching and per-phase model selection without changing research logic. This enables cost optimization and experimentation with different models.
More flexible and cost-effective than single-provider systems because teams can optimize model selection per research phase and switch providers without code changes.
research session persistence and resumable research workflows
Medium confidenceSTORM supports saving and loading research sessions, enabling resumable research workflows where a session can be paused, saved to disk, and resumed later with full context preservation. Saved sessions include research context, retrieved documents, generated outlines, and synthesis results. This enables long-running research jobs to be interrupted and resumed without losing progress, and enables sharing research state between team members.
Enables full session persistence and resumption, preserving research context, documents, and intermediate results across sessions. This enables long-running research and collaborative workflows.
More practical than stateless research systems because sessions can be paused and resumed without losing progress, enabling long-running research and team collaboration.
citation-aware long-form report generation with source attribution
Medium confidenceSTORM generates full-length reports where each claim is grounded in retrieved sources and includes inline citations. The system maintains a mapping between generated text and source documents, enabling automatic citation insertion and generation of reference lists. The report generation uses LLM-driven synthesis to convert outline sections into prose while preserving source attribution, with fallback mechanisms to handle cases where claims cannot be directly attributed to sources.
Maintains explicit source-to-claim mappings throughout generation, enabling automatic citation insertion and reference list generation. Rather than generating text and adding citations post-hoc, the system grounds synthesis in sources from the outset, reducing hallucination risk.
More verifiable than generic LLM report generation because citations are generated alongside content and traceable to specific sources, rather than added as an afterthought or omitted entirely.
web-scale document retrieval with semantic and keyword hybrid search
Medium confidenceSTORM integrates with web search APIs (and optionally local document corpora) to retrieve relevant sources for research queries. The system uses hybrid search combining keyword matching and semantic similarity to maximize recall across diverse source types. Retrieved documents are ranked by relevance and filtered for quality signals (domain authority, recency, etc.), with deduplication to avoid redundant sources. The retrieval layer abstracts over multiple search backends, enabling seamless switching between web search, academic databases, and custom corpora.
Implements hybrid search combining keyword and semantic matching, with pluggable backends for web search, academic databases, and custom corpora. The abstraction layer enables seamless switching between search sources without changing research logic.
More comprehensive than keyword-only search because semantic similarity captures conceptually related sources, and more flexible than single-backend systems because it supports multiple search sources with a unified interface.
research context management with incremental knowledge accumulation
Medium confidenceSTORM maintains a structured research context that accumulates knowledge across multiple research turns, preventing redundant queries and enabling progressive deepening of understanding. The context stores retrieved documents, generated queries, outline sections, and synthesis results, with mechanisms to detect when new queries would be redundant. The system uses this context to inform follow-up query generation and to ensure outline sections are grounded in accumulated knowledge rather than isolated retrieval results.
Explicitly models research context as a first-class artifact that accumulates across turns, enabling the system to detect redundant queries and build on previous results. Rather than treating each research turn independently, the system maintains continuity and uses context to guide future research.
More efficient than stateless research systems because it avoids re-researching the same topics and uses accumulated context to guide follow-up queries, reducing total API calls and improving research coherence.
llm-driven research task decomposition with sub-question generation
Medium confidenceSTORM uses LLM reasoning to decompose a broad research topic into specific, researchable sub-questions that can be answered independently and then synthesized. The system prompts the LLM to identify key aspects of a topic, generate clarifying questions, and propose a research strategy before executing queries. This decomposition enables more targeted searches and ensures comprehensive coverage by making implicit knowledge gaps explicit as sub-questions.
Uses LLM reasoning to explicitly decompose topics into sub-questions before executing research, rather than treating the topic as a monolithic search target. This makes the research strategy explicit and enables targeted, comprehensive coverage.
More systematic than ad-hoc research because decomposition ensures comprehensive coverage and makes the research strategy explicit and reviewable, rather than relying on implicit search strategies.
batch research execution with parallel query processing
Medium confidenceSTORM executes multiple research queries in parallel or batched fashion, retrieving sources for multiple sub-questions concurrently rather than sequentially. The system manages API rate limits and batches queries to maximize throughput while respecting search API quotas. Retrieved documents are deduplicated across queries to avoid redundant processing, and results are aggregated into a unified research context.
Executes multiple research queries in parallel with intelligent deduplication and rate-limit management, rather than processing queries sequentially. This enables faster research execution while respecting API quotas.
Faster than sequential research because parallel query execution reduces total wall-clock time, and deduplication prevents redundant API calls and document processing.
structured outline-to-prose conversion with section-level synthesis
Medium confidenceSTORM converts a structured outline (with perspective labels and source mappings) into flowing prose by synthesizing information from multiple sources into coherent sections. The system uses LLM-driven synthesis to merge information from different perspectives, resolve conflicts, and generate natural language that reads as a unified narrative rather than a collection of source excerpts. Each section is generated with awareness of its position in the outline hierarchy and its relationship to adjacent sections.
Converts structured outlines into prose while preserving outline hierarchy and perspective labels, using section-level synthesis to integrate multiple sources into coherent narrative. Rather than generating prose from scratch, the system uses outline structure to guide synthesis.
More structured and controllable than free-form prose generation because outline structure constrains the output and ensures coverage of all planned sections, reducing hallucination and off-topic content.
configurable report formatting and citation style support
Medium confidenceSTORM generates reports in multiple formats (Markdown, HTML, PDF) and supports multiple citation styles (APA, MLA, Chicago, IEEE, etc.). The system abstracts formatting logic from content generation, enabling the same research and synthesis pipeline to produce outputs in different formats. Citation formatting is handled by pluggable formatters that convert internal citation representations into style-specific formats.
Decouples content generation from formatting, enabling the same research pipeline to produce outputs in multiple formats and citation styles through pluggable formatters. This enables flexible output without re-running expensive research.
More flexible than single-format systems because the same research can be formatted for different venues and audiences without re-running research, reducing latency and API costs.
source quality filtering and credibility heuristics
Medium confidenceSTORM applies heuristic filters to retrieved sources to prioritize high-quality, credible sources and deprioritize low-quality or unreliable sources. Filters include domain authority checks (e.g., preferring .edu or .gov domains), recency filtering (prioritizing recent sources), and exclusion of known low-quality sources. The system ranks sources by credibility signals and can optionally exclude sources below a quality threshold, though final inclusion decisions are made by the LLM during synthesis.
Applies domain-based and recency-based credibility heuristics to filter and rank sources before synthesis, rather than treating all sources equally. This prioritizes authoritative sources while allowing LLM to override filters during synthesis.
More reliable than unfiltered retrieval because credibility heuristics reduce the likelihood of citing low-quality sources, though heuristics are imperfect and require LLM validation.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with STORM, ranked by overlap. Discovered automatically through the match graph.
Perplexity: Sonar Deep Research
Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers...
Tongyi DeepResearch 30B A3B
Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optimized for long-horizon, deep information-seeking tasks...
onyx
Open Source AI Platform - AI Chat with advanced features that works with every LLM
local-deep-research
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.
GPT Researcher
Autonomous agent for comprehensive research reports.
STORM
Stanford research agent that writes Wikipedia-quality articles.
Best For
- ✓researchers and analysts building knowledge bases on unfamiliar domains
- ✓content creators needing comprehensive topic coverage with minimal manual research
- ✓teams automating literature review and competitive analysis workflows
- ✓policy analysts and researchers covering contentious topics
- ✓product teams documenting features with multiple stakeholder perspectives
- ✓journalists and writers needing balanced coverage of multifaceted issues
- ✓teams managing research costs and wanting to optimize model selection per phase
- ✓researchers experimenting with different LLM capabilities
Known Limitations
- ⚠Research depth is bounded by LLM context window and token budget — very broad topics may require manual scope definition
- ⚠Query refinement quality depends on LLM reasoning capability; weaker models may generate redundant or off-topic follow-up queries
- ⚠No built-in mechanism to detect and resolve conflicting information across sources — relies on LLM synthesis
- ⚠Perspective identification is LLM-driven and may miss domain-specific viewpoints without explicit guidance
- ⚠Merging conflicting perspectives into a coherent outline requires careful prompt engineering — naive synthesis can produce incoherent structures
- ⚠No built-in conflict resolution — if sources directly contradict, the outline may reflect both without explicit flagging
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. [#opensource](https://github.com/stanford-oval/storm/)
Categories
Alternatives to STORM
Are you the builder of STORM?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →