cross-model adversarial review loop with external llm verification
Implements a two-model collaboration pattern where Claude Code executes research tasks (code generation, experiment design) while a separate external LLM (GPT-4, Claude, or configurable backend) reviews outputs independently via MCP protocol. The reviewer never sees the executor's reasoning, only final artifacts, forcing fresh evaluation and catching blind spots that single-model self-review misses. State is persisted across review cycles with checkpoint recovery.
Unique: Uses MCP-based model isolation to prevent single-model blind spots by forcing the reviewer to evaluate only final artifacts without access to executor reasoning. This mirrors adversarial vs. stochastic bandit strategies in ML theory, where the reviewer actively probes weaknesses the executor didn't anticipate. Most LLM research tools use self-review (Claude reviewing Claude); ARIS enforces architectural separation.
vs alternatives: Outperforms single-model self-review systems (like native Claude Code) by catching methodological flaws that a single model would rationalize away; costs 2x inference but produces higher-quality research artifacts suitable for publication.
autonomous idea discovery and novelty validation against literature
Orchestrates a multi-step workflow that generates novel ML research ideas by querying integrated literature sources (Zotero, Obsidian, arXiv, Semantic Scholar) to identify gaps, then validates novelty by cross-referencing recent papers and running lightweight pilot experiments. The system maintains a research wiki that tracks idea genealogy, related work, and experiment outcomes. Novelty scoring combines semantic similarity (embedding-based) and citation analysis.
Unique: Combines multi-source literature aggregation (Zotero + Obsidian + arXiv + Semantic Scholar) with embedding-based novelty scoring and lightweight pilot experiments in a single automated workflow. The research wiki maintains idea genealogy and tracks which ideas led to papers, enabling meta-analysis of research productivity. Most tools do literature search OR idea generation; ARIS closes the loop with novelty validation and outcome tracking.
vs alternatives: Faster than manual literature review + brainstorming because it parallelizes idea generation with novelty checking; more rigorous than pure LLM idea generation because it grounds ideas in actual recent papers and validates with experiments.
integration with external research tools and data sources
Provides adapters for popular research tools: Zotero (literature management), Obsidian (note-taking), Feishu/Lark (team notifications), arXiv/Semantic Scholar (paper discovery), and GPU infrastructure (SLURM, Kubernetes). Enables bidirectional sync (e.g., new papers in Zotero trigger idea discovery, paper acceptance triggers Feishu notification). Abstracts tool-specific APIs behind unified interfaces.
Unique: Provides unified adapters for popular research tools (Zotero, Obsidian, Feishu, arXiv, SLURM) with bidirectional sync. Enables workflows like 'new papers in Zotero trigger idea discovery' or 'paper acceptance triggers team notification'. Most research tools are isolated; ARIS integrates them into a cohesive ecosystem.
vs alternatives: More integrated than point-to-point tool connections because it provides unified adapters and bidirectional sync; more flexible than monolithic research platforms because it works with existing tools researchers already use.
interactive mode with human-in-the-loop checkpoints
Supports interactive execution where the system pauses at strategic checkpoints (after idea generation, after experiment results, before paper submission) and waits for human approval/feedback before proceeding. Enables researchers to review intermediate results, make manual adjustments, and guide the system toward desired outcomes. Supports both fully autonomous overnight mode and interactive mode.
Unique: Enables both fully autonomous overnight execution and interactive mode with human checkpoints at strategic points (idea approval, experiment selection, paper review). Supports flexible feedback mechanisms (approval, rejection, modifications). Most research tools are either fully autonomous or fully manual; ARIS bridges both modes.
vs alternatives: More flexible than fully autonomous systems because it enables human oversight at critical decisions; more efficient than fully manual workflows because it automates routine tasks between checkpoints.
automated iterative experiment execution with ablation and result aggregation
Manages end-to-end experiment lifecycle: Claude Code generates experiment code (training loops, hyperparameter sweeps, evaluation scripts), executes them on GPU infrastructure, collects results (metrics, logs, checkpoints), aggregates findings into structured reports, and feeds results back to the reviewer for quality assessment. Supports checkpoint recovery if experiments timeout or fail mid-run. Integrates with GPU resource budgeting to prevent runaway costs.
Unique: Implements a stateful experiment pipeline with checkpoint-based recovery, resource budgeting, and automatic result aggregation into publication-ready tables. The system tracks experiment genealogy (which ablations led to which results) and enables meta-analysis of hyperparameter sensitivity. Most experiment frameworks (Ray Tune, Weights & Biases) focus on distributed training; ARIS focuses on sequential ablation studies with human-in-the-loop review.
vs alternatives: Simpler than Ray Tune for single-GPU ablation studies because it doesn't require distributed setup; more integrated than W&B because it auto-generates paper tables and feeds results directly to the reviewer for quality assessment.
end-to-end paper generation with latex compilation and venue-specific formatting
Orchestrates paper writing by generating LaTeX source code (sections, figures, tables, citations), compiling to PDF, detecting and fixing compilation errors, and formatting for target venues (NeurIPS, ICML, ICCV, etc.). Integrates experiment results directly into paper (auto-generates figure captions, embeds tables). Maintains LaTeX template library with venue-specific styles. Handles bibliography management via BibTeX.
Unique: Closes the loop from experiments to publication by auto-generating LaTeX, detecting and fixing compilation errors, and reformatting for multiple venues using a template library. The system embeds experiment results directly (auto-generated captions, tables) and maintains venue-specific formatting rules. Most paper-writing tools focus on content generation; ARIS handles the full LaTeX pipeline including compilation and error recovery.
vs alternatives: Faster than manual LaTeX writing because it generates structure and embeds results automatically; more robust than raw Claude Code generation because it includes compilation error detection and venue-specific formatting rules.
rebuttal generation and reviewer concern parsing
Parses reviewer comments (from PDF or text), extracts concerns and questions, maps them to experiment results or paper sections, generates targeted rebuttals, and formats responses according to venue guidelines. Uses semantic matching to link reviewer concerns to relevant experiments or citations. Maintains rebuttal templates for common objection types (novelty, experimental rigor, clarity).
Unique: Automates the rebuttal pipeline by parsing reviewer concerns, mapping them to experiments via semantic matching, and generating targeted responses. Maintains rebuttal templates for common objection types and formats for multiple venues. Most tools focus on paper writing; ARIS extends to the revision cycle with concern-to-experiment traceability.
vs alternatives: Faster than manual rebuttal writing because it auto-generates structure and links concerns to experiments; more systematic than ad-hoc responses because it ensures all concerns are addressed and mapped to evidence.
research wiki and meta-optimization for idea-to-paper tracking
Maintains a persistent research wiki (markdown-based) that tracks idea genealogy, related work, experiment outcomes, and paper status. Enables meta-analysis of research productivity (which ideas led to papers, which experiments were most valuable, which venues accept which paper types). Supports automated meta-optimization: analyzing past research cycles to improve future idea generation, experiment selection, and writing strategies.
Unique: Implements a persistent research wiki that tracks idea-to-paper lineage and enables meta-analysis of research productivity. The meta-optimizer analyzes past cycles to recommend improvements (e.g., 'ideas in domain X have 60% acceptance rate, focus there'). Most research tools focus on single cycles; ARIS enables cross-cycle learning and continuous improvement.
vs alternatives: Enables long-term research optimization that single-cycle tools cannot provide; helps researchers identify high-ROI research directions based on historical data rather than intuition.
+4 more capabilities