What can pilot-shell do?

spec-driven task planning with feature/bugfix auto-detection, test-driven development enforcement with pre-implementation test generation, codebase-aware context injection with selective token budgeting, automated code review and style enforcement, session state persistence and recovery, persistent session memory with semantic codebase indexing, project-specific rules and conventions extraction via /learn, team knowledge sharing via /vault with git-backed persistence, hooks-based quality enforcement pipeline, worktree-based isolated task execution, verification and regression testing agent, mcp server integration for claude code tool calling, quick mode for low-complexity tasks without planning gates

pilot-shell

MCP ServerFree

Make Claude Code production-ready — spec-driven plans, enforced quality gates, persistent knowledge

Open Source

/ 100

13 capabilities

Capabilities13 decomposed

spec-driven task planning with feature/bugfix auto-detection

Medium confidence

Analyzes user intent via the /spec command, automatically classifies tasks as features or bugfixes, and generates structured implementation plans using a state machine dispatcher that routes to feature or bugfix workflows. The planning phase uses Claude to decompose requirements into atomic steps with estimated complexity, then presents a human-reviewable plan before implementation begins. This enforces upfront design thinking and prevents Claude Code from diverging into ad-hoc implementations.

Solves for

I want Claude to plan a feature before coding so I can review and approve the approachI need to ensure bugfixes are scoped correctly and don't introduce regressionsI want to enforce TDD discipline by requiring test plans before implementation

Best for

teams building production codebases with Claude Code

developers who want structured planning gates before AI-driven implementation

projects requiring audit trails of design decisions

Requires

Claude Code (Sonnet 4.6 or Opus 4.6)

Pilot Shell installed globally at ~/.pilot/

Active project directory with git repository

Limitations

Requires explicit /spec invocation — does not auto-trigger on unstructured requests

Plan approval is synchronous and blocks implementation until human review

Feature vs bugfix classification relies on Claude's semantic understanding and may misclassify ambiguous tasks

What makes it unique

Uses a dispatcher-based state machine that routes feature and bugfix tasks through separate workflows (feature: plan → implement → verify; bugfix: plan → implement → regression test), with mandatory human approval gates between planning and implementation phases. This architectural pattern prevents Claude from skipping the planning phase entirely.

vs alternatives

Unlike Claude Code alone (which implements immediately) or generic AI agents (which lack project context), Pilot Shell enforces structured planning with automatic task classification and blocks implementation until a human approves the plan.

test-driven development enforcement with pre-implementation test generation

Medium confidence

During the implementation phase of /spec workflows, generates test cases before code is written, then validates that all generated code passes those tests before marking tasks complete. The system uses a verification agent that runs test suites and blocks code merges if coverage or assertions are insufficient. This is enforced via hooks that intercept code changes and validate test presence before allowing commits.

Solves for

I want Claude to write tests first, then implementation (TDD discipline)I need to ensure every code change has corresponding test coverageI want automated verification that tests actually pass before code is committed

Best for

teams with strict TDD requirements or regulatory compliance needs

projects where test coverage is a non-negotiable quality metric

developers who want to prevent untested code from entering the codebase

Requires

Test framework installed and configured (Jest, pytest, Vitest, etc.)

Test runner accessible from project root

Claude Code (Sonnet 4.6 or Opus 4.6)

Limitations

Test generation quality depends on Claude's understanding of requirements — may generate incomplete or redundant tests

Requires test framework setup in the project (Jest, pytest, etc.) — does not work with projects lacking test infrastructure

Test execution adds latency to the implementation phase (typically 30-60 seconds per test suite run)

What makes it unique

Integrates test generation into the implementation phase via a hooks pipeline that intercepts code changes and validates test presence before allowing progression. Uses a verification agent that runs test suites and blocks code merges if tests fail or coverage is insufficient, making TDD non-optional rather than optional.

vs alternatives

Standard Claude Code has no built-in test enforcement; Pilot Shell's hooks pipeline and verification agent make test-first development automatic and mandatory, preventing developers from skipping tests even if they wanted to.

codebase-aware context injection with selective token budgeting

Medium confidence

Pilot Shell injects project-specific context into Claude's system prompt at session start, including extracted conventions, relevant code patterns, and project rules from the semantic index. The context injection is selective and respects Claude's token budget — only the most relevant patterns are injected based on the current task, preventing context window overflow. The system uses a context monitor to track which files are most relevant to the current task and prioritizes injection of related patterns.

Solves for

I want Claude to understand project conventions without manually explaining themI need Claude to have access to relevant code patterns without exceeding token limitsI want to ensure Claude uses the right patterns and conventions for the current task

Best for

large codebases with strong architectural patterns

projects with non-obvious conventions or custom tooling

teams wanting to minimize context setup overhead

Requires

Semantic index built via /sync command

Project rules and conventions extracted and stored

Limitations

Context injection is selective and may miss relevant patterns if the context monitor misidentifies task scope

Token budgeting is heuristic-based and may be overly conservative or aggressive

Injected context is static at session start — does not update if task scope changes mid-session

What makes it unique

Uses a context monitor to selectively inject the most relevant project patterns into Claude's system prompt based on task scope, respecting token budgets by prioritizing high-impact patterns. This enables codebase awareness without exceeding context window limits, making large-codebase support practical.

vs alternatives

Unlike RAG systems that inject all matching documents (risking token overflow) or manual context setup (which is tedious), Pilot Shell's selective context injection uses task-aware heuristics to inject only the most relevant patterns, balancing context richness with token efficiency.

automated code review and style enforcement

Medium confidence

The verification phase includes an automated code review agent that checks for style violations, architectural inconsistencies, and deviations from project conventions. The agent uses the extracted project rules and conventions to validate that generated code follows established patterns. Code that violates style or architectural rules is flagged and can block merges, providing automated enforcement of code quality standards without requiring manual review.

Solves for

I want to ensure generated code follows project style and architectural conventionsI need automated detection of code quality issues before human reviewI want to prevent architectural drift as the codebase evolves

Best for

teams with strict code style and architectural standards

large codebases where architectural consistency is critical

projects where manual code review is a bottleneck

Requires

Project rules and conventions extracted via /sync or /learn

Code style configuration (eslint, black, etc.)

Limitations

Code review agent quality depends on the completeness of extracted project rules — incomplete rules lead to weak reviews

Architectural violations are detected heuristically and may have false positives or false negatives

Code review adds latency to the verification phase (typically 30-60 seconds)

What makes it unique

Implements an automated code review agent that validates generated code against extracted project rules and conventions, providing architectural and style enforcement without manual review. The agent uses the same rules extracted by /sync and /learn, making reviews consistent with project standards.

vs alternatives

Unlike manual code review (which is slow and subjective) or linting tools alone (which only check syntax), Pilot Shell's code review agent understands project conventions and architectural patterns, providing semantic-level code quality assurance.

session state persistence and recovery

Medium confidence

Pilot Shell persists session state (current task, implementation progress, test results, verification status) to disk, enabling recovery if a session crashes or is interrupted. The worker service maintains a session state file that tracks the current /spec task, implementation phase, and verification results. If a session is interrupted, the next session can resume from the last checkpoint, preventing loss of work and enabling recovery from failures.

Solves for

I want to resume a /spec task if my session crashesI need to track progress across multiple sessions on the same taskI want to avoid losing work if my machine crashes or connection drops

Best for

developers working on long-running /spec tasks

teams in unreliable network environments

projects where task interruption is common

Requires

Pilot Shell worker service running

Write access to ~/.pilot/sessions/ directory

Limitations

Session recovery is best-effort — some state may be lost if the crash is severe

Persisted state is local to the machine — does not sync across team members

Session state files can accumulate and require periodic cleanup

What makes it unique

Persists session state to disk via the worker service, enabling recovery from crashes and interruptions. Session state includes current task, implementation progress, test results, and verification status, allowing seamless resumption from the last checkpoint.

vs alternatives

Unlike Claude Code alone (which has no session persistence) or manual checkpointing (which is error-prone), Pilot Shell's automatic session persistence enables recovery from crashes without user intervention, making long-running tasks more reliable.

persistent session memory with semantic codebase indexing

Medium confidence

The /sync command builds a semantic search index of the entire codebase using embeddings, then stores project-specific context (architecture patterns, naming conventions, dependencies, test patterns) in a persistent memory store that survives across sessions. This context is automatically injected into Claude's context window at the start of each session, enabling Claude to understand project conventions without requiring manual context setup. The context monitor continuously tracks changes to key files and updates the index incrementally.

Solves for

I want Claude to remember project conventions and architecture patterns across multiple sessionsI need Claude to understand the codebase structure without manually explaining it each timeI want to avoid repeating context setup when resuming work on a project

Best for

teams working on large, complex codebases with strong architectural patterns

projects with non-obvious conventions or custom tooling that Claude needs to learn

developers who want to minimize context-window overhead by pre-indexing the codebase

Requires

Pilot Shell installed with memory subsystem enabled

Git repository with accessible .git directory

Sufficient disk space for semantic index (~100MB per 10k files)

Limitations

Initial /sync indexing can take 2-5 minutes on large codebases (10k+ files)

Semantic index requires embedding generation, which adds ~500ms per file on first run

Index becomes stale if codebase changes significantly between sessions — requires manual /sync refresh

What makes it unique

Uses a context monitor hook that tracks file changes and incrementally updates the semantic index, combined with a memory & console system that persists extracted conventions across sessions. The index is injected into Claude's context at session start, eliminating the need for manual context setup while staying within token budgets via selective injection of relevant patterns.

vs alternatives

Unlike Claude Code alone (which has no persistent memory between sessions) or generic RAG systems (which require manual indexing), Pilot Shell's /sync command automatically indexes the codebase and injects relevant context at session start, making project knowledge persistent without manual effort.

project-specific rules and conventions extraction via /learn

Medium confidence

The /learn command captures non-obvious discoveries from the current session (e.g., 'this project uses a custom logger instead of console.log', 'all async functions must have timeout handling') and converts them into reusable skill files stored in ~/.pilot/skills/. These skills are automatically loaded into Claude's context for future sessions on the same project, and can be shared across teams via the /vault command. The system uses Claude to extract generalizable patterns from session interactions and format them as structured rules.

Solves for

I want to capture lessons learned during a session so Claude remembers them next timeI need to document project-specific patterns that aren't in the codebase (e.g., error handling conventions)I want to share discovered patterns with my team without manual documentation

Best for

teams with implicit or undocumented project conventions

projects where knowledge is scattered across team members' heads

developers who want to build up a library of reusable project patterns over time

Requires

Pilot Shell installed with skills subsystem

Active session with Claude Code

Write access to ~/.pilot/skills/ directory

Limitations

Skill extraction quality depends on Claude's ability to generalize from examples — may capture overly specific or incorrect patterns

Skills are not automatically validated; incorrect rules can persist and mislead future sessions

Requires manual /learn invocation — does not auto-capture patterns without explicit user action

What makes it unique

Converts session discoveries into structured skill files that are automatically loaded into Claude's context for future sessions, with a /vault integration for team-wide sharing. Unlike generic documentation, skills are machine-readable and directly injected into Claude's reasoning, making them immediately actionable.

vs alternatives

Standard Claude Code has no mechanism to capture and reuse project-specific patterns; Pilot Shell's /learn command converts ephemeral session insights into persistent, shareable skills that improve Claude's performance on future tasks in the same project.

team knowledge sharing via /vault with git-backed persistence

Medium confidence

The /vault command shares rules, commands, skills, hooks, and agents across a team by syncing them to a private Git repository. Each team member's local ~/.pilot/ and ~/.claude/ directories can be configured to pull from a shared vault repository, enabling centralized management of project conventions, custom hooks, and reusable agents. The system uses Git as the backing store and provides conflict resolution via simple merge strategies (last-write-wins or manual resolution).

Solves for

I want to share project rules and conventions with my entire teamI need to ensure all team members use the same custom hooks and quality gatesI want to build a library of reusable agents and skills that the team can leverage

Best for

teams with 3+ developers working on the same codebase

organizations wanting to enforce consistent quality standards across projects

teams building a library of reusable Pilot Shell extensions

Requires

Private Git repository (GitHub, GitLab, Gitea, etc.)

Git credentials configured on all team machines

Write access to the vault repository for all team members

Limitations

Requires a private Git repository (GitHub, GitLab, etc.) — adds operational overhead

Vault sync is manual (via /vault command) — does not auto-sync on every session

Merge conflicts in vault files require manual resolution — no built-in conflict resolution UI

What makes it unique

Uses Git as the backing store for team knowledge, enabling decentralized sync with version history and audit trails. Rules, skills, hooks, and agents are stored as files in the vault repository and pulled into each team member's local ~/.pilot/ directory, making team knowledge portable and version-controlled.

vs alternatives

Unlike centralized knowledge bases (which require a server) or manual documentation (which gets out of sync), Pilot Shell's /vault uses Git for decentralized, version-controlled sharing of project-specific rules and agents, making team knowledge portable and auditable.

hooks-based quality enforcement pipeline

Medium confidence

A pre-commit and post-change hooks pipeline that intercepts code modifications and enforces quality standards before code can be committed or merged. The pipeline includes a file checker hook (validates syntax, linting, formatting), a context monitor hook (tracks changes to key files), and a tool redirect hook (intercepts Claude's tool calls and validates them against project rules). Hooks are defined in project-specific or team-wide configuration and are automatically applied to all code changes, making quality enforcement non-optional.

Solves for

I want to prevent code that fails linting or formatting from being committedI need to ensure Claude follows project-specific tool usage rules (e.g., only use approved APIs)I want to track changes to critical files and trigger alerts or reviews when they change

Best for

teams with strict code quality standards

projects where certain files (config, security, database schema) require special handling

developers who want to prevent Claude from using unapproved tools or APIs

Requires

Pilot Shell installed with hooks subsystem

Hook configuration file in project root or ~/.pilot/hooks/

Linting/formatting tools installed (eslint, prettier, black, etc.)

Limitations

Hooks add latency to code changes (typically 500ms-2s per hook execution)

Hook failures block code commits — can be frustrating if hooks are overly strict

Hooks are project-specific and must be configured per project — no global defaults

What makes it unique

Implements a multi-stage hooks pipeline that runs at different points in the development workflow (file checker on every change, context monitor on key files, tool redirect on Claude's tool calls). Hooks are composable and can be extended with custom scripts, making the quality enforcement system flexible and project-specific.

vs alternatives

Unlike pre-commit hooks alone (which only run at commit time) or linting tools (which are passive), Pilot Shell's hooks pipeline actively intercepts code changes and tool calls, enforcing quality standards at multiple points in the workflow and preventing non-compliant code from progressing.

worktree-based isolated task execution

Medium confidence

Each /spec task executes in an isolated Git worktree (a separate working directory linked to the same repository), preventing concurrent tasks from interfering with each other and enabling safe rollback if a task fails. The worktree is created at task start, code changes are made in isolation, and the worktree is merged back to the main branch only after verification passes. This architectural pattern enables safe parallel task execution and provides a natural rollback mechanism if verification fails.

Solves for

I want to run multiple /spec tasks in parallel without them interfering with each otherI need a safe way to rollback a task if verification failsI want to prevent accidental commits of incomplete or broken code

Best for

teams running multiple Claude Code sessions in parallel

projects where task isolation is critical for safety

developers who want automatic rollback on verification failure

Requires

Git 2.7+ (worktree support)

Sufficient disk space for multiple worktree copies

Pilot Shell installed with worktree integration

Limitations

Worktree creation adds ~1-2 seconds of overhead per task

Worktree merging can fail if there are conflicts with the main branch — requires manual conflict resolution

Worktrees consume disk space (one copy of the codebase per active task)

What makes it unique

Uses Git worktrees as the isolation mechanism for /spec tasks, enabling safe parallel execution and automatic rollback on verification failure. Each task gets its own working directory linked to the same repository, preventing concurrent tasks from interfering and providing a natural merge point for verification.

vs alternatives

Unlike branching (which requires manual branch management and merging) or stashing (which is error-prone), Pilot Shell's worktree-based approach provides automatic isolation and rollback with minimal user intervention, making parallel task execution safe and predictable.

verification and regression testing agent

Medium confidence

After implementation completes, a verification agent runs the full test suite, checks for regressions, and validates that the implementation meets the original specification. For bugfixes, the agent specifically checks that the bug is fixed and no new bugs are introduced. For features, the agent validates that all acceptance criteria are met. The agent can block code merges if verification fails, providing a quality gate before code reaches the main branch.

Solves for

I want to ensure implemented code actually solves the original problemI need to verify that bugfixes don't introduce new regressionsI want automated validation that feature implementations meet acceptance criteria

Best for

teams with strict quality requirements

projects where regressions are costly or dangerous

developers who want automated validation before code merges

Requires

Comprehensive test suite covering the feature or bugfix

Acceptance criteria explicitly defined in the /spec plan

Test runner accessible from project root

Limitations

Verification quality depends on test suite completeness — cannot catch bugs not covered by tests

Regression testing requires a comprehensive test suite — projects with poor test coverage will have weak verification

Verification adds latency to the /spec workflow (typically 1-3 minutes per task)

What makes it unique

Implements a dedicated verification agent that runs after implementation and validates against the original specification and acceptance criteria. For bugfixes, it specifically checks that the bug is fixed and no regressions are introduced; for features, it validates that all acceptance criteria are met. This provides a structured quality gate before code merges.

vs alternatives

Unlike manual testing (which is slow and error-prone) or generic CI/CD pipelines (which lack context about the original specification), Pilot Shell's verification agent understands the original task and validates that the implementation actually solves the problem, providing context-aware quality assurance.

mcp server integration for claude code tool calling

Medium confidence

Pilot Shell exposes a Model Context Protocol (MCP) server that provides Claude Code with access to Pilot Shell commands (/spec, /sync, /learn, /vault) and project-specific tools via a standardized function-calling interface. The MCP server runs as a background service and handles tool schema registration, argument validation, and execution. This enables Claude Code to invoke Pilot Shell workflows programmatically rather than requiring manual slash command invocation.

Solves for

I want Claude to automatically invoke /spec workflows without manual command entryI need Claude to call project-specific tools and APIs through a standardized interfaceI want to extend Claude's capabilities with custom tools defined in my project

Best for

teams using Claude Code with Pilot Shell integration

projects with custom tools or APIs that Claude needs to call

developers who want Claude to autonomously invoke Pilot Shell workflows

Requires

Claude Code (Sonnet 4.6 or Opus 4.6) with MCP support

Pilot Shell MCP server running (started automatically by pilot binary)

Tool schemas defined in project configuration

Limitations

MCP server adds ~200ms latency per tool call (network overhead)

Tool schemas must be explicitly defined — no automatic schema generation from code

MCP server is single-threaded — concurrent tool calls are serialized

What makes it unique

Implements an MCP server that exposes Pilot Shell commands and project-specific tools through a standardized function-calling interface, enabling Claude Code to invoke workflows programmatically. The server handles schema registration, argument validation, and execution, making tool integration seamless and standardized.

vs alternatives

Unlike manual slash command invocation (which requires user interaction) or custom integrations (which are project-specific), Pilot Shell's MCP server provides a standardized, programmatic interface for Claude to invoke workflows and tools, enabling autonomous execution and better integration with Claude Code's reasoning loop.

quick mode for low-complexity tasks without planning gates

Medium confidence

For tasks classified as low-complexity, Pilot Shell automatically activates Quick Mode, which bypasses the planning phase and approval gate, allowing direct implementation with quality hooks and TDD enforcement still active. Quick Mode is triggered automatically based on task complexity heuristics (e.g., single-file changes, simple bug fixes) and can be manually invoked with /spec --quick. This provides a fast path for simple tasks while maintaining quality standards.

Solves for

I want to handle simple tasks quickly without waiting for plan approvalI need fast iteration on low-risk changes while still enforcing quality standardsI want to avoid planning overhead for trivial changes

Best for

teams with a mix of simple and complex tasks

developers who want fast iteration on low-risk changes

projects where planning overhead is significant relative to task complexity

Requires

Pilot Shell installed with Quick Mode support

Quality hooks and TDD enforcement configured

Limitations

Quick Mode complexity heuristics may misclassify tasks — simple-looking tasks can have hidden complexity

Bypassing planning can lead to missed edge cases or architectural issues

Quick Mode is not suitable for tasks affecting critical systems or security-sensitive code

What makes it unique

Automatically detects low-complexity tasks and bypasses the planning phase while maintaining quality hooks and TDD enforcement. This provides a fast path for simple tasks without sacrificing quality standards, balancing speed and safety based on task complexity.

vs alternatives

Unlike Claude Code alone (which has no complexity-based routing) or strict planning-first approaches (which add overhead to all tasks), Pilot Shell's Quick Mode provides context-aware routing that speeds up simple tasks while maintaining quality gates for complex work.

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with pilot-shell, ranked by overlap. Discovered automatically through the match graph.

Model21

Qwen2.5 Coder 32B Instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...

test case generation and test-driven development support

1 shared capability

Product17

Ellipsis

(Previously BitBuilder) "Automated code reviews and bug fixes"

automated bug fix generation and application

1 shared capability

Product19

Factory

Coding Droids for building software end-to-end

test suite generation with coverage-aware strategy

1 shared capability

Extension40

Codiumate (Qodo Gen)

AI test generation and code integrity analysis.

codebase-aware test suite generation from code changes

1 shared capability

Extension45

Claude Opus 4.7, GPT-5.4, Gemini-3.1, Cursor AI, Copilot, Codex,Cline and ChatGPT, AI Copilot, AI Agents and Debugger, Code Assistants, Code Chat, Code Generator, Code Completion, Generative AI, Autoc

Claude Opus 4.7, GPT-5.4, Gemini-3.1, AI Coding Assistant is a lightweight for helping developers automate all the boring stuff like writing code, real-time code completion, debugging, auto generating doc string and many more. Trusted by 100K+ devs from Amazon, Apple, Google, & more. Offers all the

deep planning mode with task decomposition

1 shared capability

Extension27

Pagetok

Your AI agent for any project. It plans, edit files, searches and learns from the Internet. Free and effective.

task decomposition and project planning with step-by-step execution

1 shared capability

Best For

✓teams building production codebases with Claude Code
✓developers who want structured planning gates before AI-driven implementation
✓projects requiring audit trails of design decisions
✓teams with strict TDD requirements or regulatory compliance needs
✓projects where test coverage is a non-negotiable quality metric
✓developers who want to prevent untested code from entering the codebase
✓large codebases with strong architectural patterns
✓projects with non-obvious conventions or custom tooling

Known Limitations

⚠Requires explicit /spec invocation — does not auto-trigger on unstructured requests
⚠Plan approval is synchronous and blocks implementation until human review
⚠Feature vs bugfix classification relies on Claude's semantic understanding and may misclassify ambiguous tasks
⚠Test generation quality depends on Claude's understanding of requirements — may generate incomplete or redundant tests
⚠Requires test framework setup in the project (Jest, pytest, etc.) — does not work with projects lacking test infrastructure
⚠Test execution adds latency to the implementation phase (typically 30-60 seconds per test suite run)

Requirements

Claude Code (Sonnet 4.6 or Opus 4.6)Pilot Shell installed globally at ~/.pilot/Active project directory with git repositoryTest framework installed and configured (Jest, pytest, Vitest, etc.)Test runner accessible from project rootSemantic index built via /sync commandProject rules and conventions extracted and storedProject rules and conventions extracted via /sync or /learn

Input / Output

Accepts: natural language task description, existing codebase context (automatically loaded), feature/bugfix specification from /spec planning phase, existing test files and patterns in the codebase, semantic index from /sync, current task description, project configuration, generated code, project rules and conventions, code style configuration, session state (task, progress, results), entire codebase (files, directory structure, git history), project configuration files (package.json, pyproject.toml, etc.), natural language description of discovered pattern or convention, code examples demonstrating the pattern, local rules, skills, hooks, and agents from ~/.pilot/, vault repository URL and credentials, code changes (files, diffs), Claude's tool calls (for tool redirect hook), hook configuration (YAML or JSON), task specification from /spec command, main branch state, implementation code from the implementation phase, test suite, original specification and acceptance criteria, tool call requests from Claude Code, tool arguments (JSON), task description, codebase context

Produces: structured plan JSON with steps, complexity estimates, and test requirements, human-readable markdown plan for review, generated test files (e.g., .test.ts, .test.py), implementation code that passes all tests, test execution logs and coverage reports, context injection payload (text), injected into Claude's system prompt, code review report with violations and suggestions, pass/fail decision for merge, persisted session state file, recovered session state on resume, semantic vector index stored in ~/.pilot/memory/, extracted project rules and conventions in JSON format, context injection payload for Claude's system prompt, skill file (YAML or JSON) stored in ~/.pilot/skills/, structured rule with examples and applicability conditions, synced rules, skills, hooks, and agents in shared Git repository, updated local ~/.pilot/ directory with vault contents, pass/fail validation results, linting and formatting errors, tool call validation results, isolated worktree with task-specific changes, merged changes back to main branch (on verification success), test execution results (pass/fail), regression test results, verification report with pass/fail decision, tool execution results (JSON), error messages on tool call failure, implemented code with tests, verification results

UnfragileRank

Adoption26%(30% weight)

Quality45%(25% weight)

Ecosystem80%(25% weight)

Match Graph10%(15% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: MCP Server

13 capabilities

Visit pilot-shell→

Repository Details

1,660

Stars

138

Forks

TypeScript

Language

NOASSERTION

License

Topics

ai-agentsai-assistantai-codingai-coding-toolsai-engineeringai-toolsanthropicanthropic-claudeclaudeclaude-aiclaude-codeclaude-contextclaude-skillsclaudecodemodel-context-protocolspec-driven-development

Last commit: Apr 21, 2026

About

Make Claude Code production-ready — spec-driven plans, enforced quality gates, persistent knowledge

Alternatives to pilot-shell

IntelliCode50Extension

AI-assisted development

Compare →

GitHub Copilot Chat53Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot52Extension

Your AI pair programmer

Compare →

Claude Code for VS Code52Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

Are you the builder of pilot-shell?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

github

Looking for something else?

Search →

Capabilities13 decomposed

spec-driven task planning with feature/bugfix auto-detection

Medium confidence

Solves for

Best for

teams building production codebases with Claude Code

developers who want structured planning gates before AI-driven implementation

projects requiring audit trails of design decisions

Requires

Claude Code (Sonnet 4.6 or Opus 4.6)

Pilot Shell installed globally at ~/.pilot/

Active project directory with git repository

Limitations

Requires explicit /spec invocation — does not auto-trigger on unstructured requests

Plan approval is synchronous and blocks implementation until human review

Feature vs bugfix classification relies on Claude's semantic understanding and may misclassify ambiguous tasks

What makes it unique

vs alternatives

test-driven development enforcement with pre-implementation test generation

Medium confidence

Solves for

Best for

teams with strict TDD requirements or regulatory compliance needs

projects where test coverage is a non-negotiable quality metric

developers who want to prevent untested code from entering the codebase

Requires

Test framework installed and configured (Jest, pytest, Vitest, etc.)

Test runner accessible from project root

Claude Code (Sonnet 4.6 or Opus 4.6)

Limitations

Test generation quality depends on Claude's understanding of requirements — may generate incomplete or redundant tests

Requires test framework setup in the project (Jest, pytest, etc.) — does not work with projects lacking test infrastructure

Test execution adds latency to the implementation phase (typically 30-60 seconds per test suite run)

What makes it unique

vs alternatives

codebase-aware context injection with selective token budgeting

Medium confidence

Solves for

Best for

large codebases with strong architectural patterns

projects with non-obvious conventions or custom tooling

teams wanting to minimize context setup overhead

Requires

Semantic index built via /sync command

Project rules and conventions extracted and stored

Limitations

Context injection is selective and may miss relevant patterns if the context monitor misidentifies task scope

Token budgeting is heuristic-based and may be overly conservative or aggressive

Injected context is static at session start — does not update if task scope changes mid-session

What makes it unique

vs alternatives

automated code review and style enforcement

Medium confidence

Solves for

Best for

teams with strict code style and architectural standards

large codebases where architectural consistency is critical

projects where manual code review is a bottleneck

Requires

Project rules and conventions extracted via /sync or /learn

Code style configuration (eslint, black, etc.)

Limitations

Code review agent quality depends on the completeness of extracted project rules — incomplete rules lead to weak reviews

Architectural violations are detected heuristically and may have false positives or false negatives

Code review adds latency to the verification phase (typically 30-60 seconds)

What makes it unique

vs alternatives

session state persistence and recovery

Medium confidence

Solves for

I want to resume a /spec task if my session crashesI need to track progress across multiple sessions on the same taskI want to avoid losing work if my machine crashes or connection drops

Best for

developers working on long-running /spec tasks

teams in unreliable network environments

projects where task interruption is common

Requires

Pilot Shell worker service running

Write access to ~/.pilot/sessions/ directory

Limitations

Session recovery is best-effort — some state may be lost if the crash is severe

Persisted state is local to the machine — does not sync across team members

Session state files can accumulate and require periodic cleanup

What makes it unique

vs alternatives

persistent session memory with semantic codebase indexing

Medium confidence

Solves for

Best for

teams working on large, complex codebases with strong architectural patterns

projects with non-obvious conventions or custom tooling that Claude needs to learn

developers who want to minimize context-window overhead by pre-indexing the codebase

Requires

Pilot Shell installed with memory subsystem enabled

Git repository with accessible .git directory

Sufficient disk space for semantic index (~100MB per 10k files)

Limitations

Initial /sync indexing can take 2-5 minutes on large codebases (10k+ files)

Semantic index requires embedding generation, which adds ~500ms per file on first run

Index becomes stale if codebase changes significantly between sessions — requires manual /sync refresh

What makes it unique

vs alternatives

project-specific rules and conventions extraction via /learn

Medium confidence

Solves for

Best for

teams with implicit or undocumented project conventions

projects where knowledge is scattered across team members' heads

developers who want to build up a library of reusable project patterns over time

Requires

Pilot Shell installed with skills subsystem

Active session with Claude Code

Write access to ~/.pilot/skills/ directory

Limitations

Skill extraction quality depends on Claude's ability to generalize from examples — may capture overly specific or incorrect patterns

Skills are not automatically validated; incorrect rules can persist and mislead future sessions

Requires manual /learn invocation — does not auto-capture patterns without explicit user action

What makes it unique

vs alternatives

team knowledge sharing via /vault with git-backed persistence

Medium confidence

Solves for

Best for

teams with 3+ developers working on the same codebase

organizations wanting to enforce consistent quality standards across projects

teams building a library of reusable Pilot Shell extensions

Requires

Private Git repository (GitHub, GitLab, Gitea, etc.)

Git credentials configured on all team machines

Write access to the vault repository for all team members

Limitations

Requires a private Git repository (GitHub, GitLab, etc.) — adds operational overhead

Vault sync is manual (via /vault command) — does not auto-sync on every session

Merge conflicts in vault files require manual resolution — no built-in conflict resolution UI

What makes it unique

vs alternatives

hooks-based quality enforcement pipeline

Medium confidence

Solves for

Best for

teams with strict code quality standards

projects where certain files (config, security, database schema) require special handling

developers who want to prevent Claude from using unapproved tools or APIs

Requires

Pilot Shell installed with hooks subsystem

Hook configuration file in project root or ~/.pilot/hooks/

Linting/formatting tools installed (eslint, prettier, black, etc.)

Limitations

Hooks add latency to code changes (typically 500ms-2s per hook execution)

Hook failures block code commits — can be frustrating if hooks are overly strict

Hooks are project-specific and must be configured per project — no global defaults

What makes it unique

vs alternatives

worktree-based isolated task execution

Medium confidence

Solves for

Best for

teams running multiple Claude Code sessions in parallel

projects where task isolation is critical for safety

developers who want automatic rollback on verification failure

Requires

Git 2.7+ (worktree support)

Sufficient disk space for multiple worktree copies

Pilot Shell installed with worktree integration

Limitations

Worktree creation adds ~1-2 seconds of overhead per task

Worktree merging can fail if there are conflicts with the main branch — requires manual conflict resolution

Worktrees consume disk space (one copy of the codebase per active task)

What makes it unique

vs alternatives

verification and regression testing agent

Medium confidence

Solves for

Best for

teams with strict quality requirements

projects where regressions are costly or dangerous

developers who want automated validation before code merges

Requires

Comprehensive test suite covering the feature or bugfix

Acceptance criteria explicitly defined in the /spec plan

Test runner accessible from project root

Limitations

Verification quality depends on test suite completeness — cannot catch bugs not covered by tests

Regression testing requires a comprehensive test suite — projects with poor test coverage will have weak verification

Verification adds latency to the /spec workflow (typically 1-3 minutes per task)

What makes it unique

vs alternatives

mcp server integration for claude code tool calling

Medium confidence

Solves for

Best for

teams using Claude Code with Pilot Shell integration

projects with custom tools or APIs that Claude needs to call

developers who want Claude to autonomously invoke Pilot Shell workflows

Requires

Claude Code (Sonnet 4.6 or Opus 4.6) with MCP support

Pilot Shell MCP server running (started automatically by pilot binary)

Tool schemas defined in project configuration

Limitations

MCP server adds ~200ms latency per tool call (network overhead)

Tool schemas must be explicitly defined — no automatic schema generation from code

MCP server is single-threaded — concurrent tool calls are serialized

What makes it unique

vs alternatives

quick mode for low-complexity tasks without planning gates

Medium confidence

Solves for

Best for

teams with a mix of simple and complex tasks

developers who want fast iteration on low-risk changes

projects where planning overhead is significant relative to task complexity

Requires

Pilot Shell installed with Quick Mode support

Quality hooks and TDD enforcement configured

Limitations

Quick Mode complexity heuristics may misclassify tasks — simple-looking tasks can have hidden complexity

Bypassing planning can lead to missed edge cases or architectural issues

Quick Mode is not suitable for tasks affecting critical systems or security-sensitive code

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to pilot-shell

IntelliCode50Extension

AI-assisted development

Compare →

GitHub Copilot Chat53Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot52Extension

Your AI pair programmer

Compare →

Claude Code for VS Code52Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

pilot-shell

Capabilities13 decomposed

spec-driven task planning with feature/bugfix auto-detection

test-driven development enforcement with pre-implementation test generation

codebase-aware context injection with selective token budgeting

automated code review and style enforcement

session state persistence and recovery

persistent session memory with semantic codebase indexing

project-specific rules and conventions extraction via /learn

team knowledge sharing via /vault with git-backed persistence

hooks-based quality enforcement pipeline

worktree-based isolated task execution

verification and regression testing agent

mcp server integration for claude code tool calling

quick mode for low-complexity tasks without planning gates

Related Artifactssharing capabilities

Qwen2.5 Coder 32B Instruct

Ellipsis

Factory

Codiumate (Qodo Gen)

Claude Opus 4.7, GPT-5.4, Gemini-3.1, Cursor AI, Copilot, Codex,Cline and ChatGPT, AI Copilot, AI Agents and Debugger, Code Assistants, Code Chat, Code Generator, Code Completion, Generative AI, Autoc

Pagetok

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to pilot-shell

Are you the builder of pilot-shell?

Get the weekly brief

Data Sources

pilot-shell

Capabilities13 decomposed

spec-driven task planning with feature/bugfix auto-detection

test-driven development enforcement with pre-implementation test generation

codebase-aware context injection with selective token budgeting

automated code review and style enforcement

session state persistence and recovery

persistent session memory with semantic codebase indexing

project-specific rules and conventions extraction via /learn

team knowledge sharing via /vault with git-backed persistence

hooks-based quality enforcement pipeline

worktree-based isolated task execution

verification and regression testing agent

mcp server integration for claude code tool calling

quick mode for low-complexity tasks without planning gates

Related Artifactssharing capabilities

Qwen2.5 Coder 32B Instruct

Ellipsis

Factory

Codiumate (Qodo Gen)

Claude Opus 4.7, GPT-5.4, Gemini-3.1, Cursor AI, Copilot, Codex,Cline and ChatGPT, AI Copilot, AI Agents and Debugger, Code Assistants, Code Chat, Code Generator, Code Completion, Generative AI, Autoc

Pagetok

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to pilot-shell

Are you the builder of pilot-shell?

Get the weekly brief

Data Sources