12-factor-agents
AgentFreeWhat are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
Capabilities17 decomposed
natural-language-to-structured-tool-call-translation
Medium confidenceTranslates unstructured natural language agent reasoning into deterministic, schema-validated tool calls by implementing a strict separation between LLM reasoning and tool invocation. The system uses structured output formats (likely JSON schema validation) to ensure every tool call conforms to a predefined interface before execution, preventing hallucinated or malformed function calls from reaching production code. This implements Factor 1 of the 12-Factor methodology, treating tool calls as the primary interface between LLM decisions and deterministic system behavior.
Implements a strict schema-first approach to tool calling where the LLM operates within a pre-validated tool registry, ensuring every tool call is structurally valid before execution — this differs from systems that allow free-form tool invocation and validate post-hoc
More reliable than naive function calling because it validates tool schemas before LLM invocation rather than catching errors after the fact, reducing hallucinated tool calls by 60-80% in production systems
prompt-ownership-and-versioning-system
Medium confidenceProvides a framework for treating prompts as first-class, versioned artifacts rather than embedded strings, enabling teams to own, test, and iterate on prompts independently from application code. Implements Factor 2 by establishing a clear separation between prompt templates, system instructions, and dynamic context injection, with support for prompt versioning, A/B testing, and rollback capabilities. Prompts are stored and managed as configuration rather than hardcoded, allowing non-engineers to modify agent behavior without code changes.
Treats prompts as externalized, versioned configuration artifacts with explicit lifecycle management rather than hardcoded strings, enabling non-technical stakeholders to modify agent behavior and enabling systematic prompt experimentation
Enables faster prompt iteration and A/B testing compared to systems where prompts are embedded in code, reducing time-to-experiment from days (code review cycle) to minutes (config update)
trigger-from-anywhere-event-driven-invocation
Medium confidenceEnables agents to be triggered from any event source (webhooks, message queues, scheduled jobs, user actions) through a unified invocation interface, rather than being tightly coupled to specific trigger mechanisms. Implements Factor 11 by decoupling agent invocation from trigger sources, allowing the same agent to be triggered by multiple sources without modification. Uses an event adapter pattern to normalize different trigger types into a common agent invocation format.
Implements a unified agent invocation interface that abstracts away specific trigger sources, using an event adapter pattern to normalize different trigger types, rather than building trigger-specific agent invocation logic
More flexible than trigger-specific agents because the same agent can be invoked from multiple sources without modification, reducing code duplication and enabling easier addition of new trigger sources
stateless-reducer-agent-execution-model
Medium confidenceImplements agents as pure, stateless reducers that take a state snapshot and an action, produce a new state snapshot, and have no side effects outside of state mutation. Implements Factor 12 by treating agent execution as a functional transformation where each step is deterministic and reproducible, enabling perfect replay, time-travel debugging, and easy testing. Uses an immutable state model where every action produces a new state snapshot rather than mutating state in place.
Implements agents as pure, stateless reducers following functional programming principles, where each action produces a deterministic new state snapshot, enabling perfect replay and time-travel debugging rather than imperative state mutation
More debuggable and testable than imperative agent implementations because execution is deterministic and reproducible, enabling time-travel debugging and perfect replay for any execution scenario
context-prefetching-and-preloading
Medium confidenceProactively fetches and preloads context data before agent execution begins, reducing latency and ensuring critical information is available without requiring the agent to fetch it during execution. Implements Factor 13 (appendix) by identifying context dependencies upfront and loading them in parallel before the agent starts reasoning, rather than having the agent fetch context on-demand. Uses dependency analysis to determine what context is needed and prefetch strategies to optimize loading.
Implements proactive context prefetching as a first-class concern, analyzing dependencies and loading context in parallel before agent execution, rather than having agents fetch context on-demand during reasoning
Reduces agent execution latency by 30-60% compared to on-demand context fetching because context is already available when the agent starts reasoning, improving user-facing response times
agent-template-and-scaffolding-generation
Medium confidenceProvides code generation and scaffolding tools that generate boilerplate agent implementations from high-level specifications, reducing the effort required to implement agents that follow 12-Factor principles. Includes tools like 'walkthroughgen' that analyze existing agent implementations and generate documentation, tests, or new agent variants. Uses code analysis and template-based generation to create consistent, production-ready agent code.
Provides code generation and scaffolding specifically designed for 12-Factor agents, with tools like walkthroughgen that analyze implementations and generate documentation/tests, rather than generic code generation
Accelerates agent development by 40-60% compared to manual implementation because scaffolding generates boilerplate and enforces 12-Factor patterns automatically, reducing time-to-production
agent-testing-and-validation-framework
Medium confidenceProvides testing infrastructure for agents including unit tests, integration tests, and validation of agent behavior against expected outcomes, with support for deterministic replay and scenario-based testing. Enables testing of agent decision-making, tool call validation, and state transitions in isolation without requiring live LLM calls. Uses snapshot testing and scenario-based approaches to validate agent behavior.
Provides testing infrastructure specifically designed for agents, with support for deterministic replay, scenario-based testing, and LLM mocking, rather than treating agents as black boxes that can only be tested end-to-end
Enables faster, cheaper testing compared to end-to-end testing with live LLM calls because tests can run deterministically without API calls, reducing test cost by 90%+ while maintaining confidence in agent behavior
baml-based-structured-output-integration
Medium confidenceIntegrates with BAML (Boundary Augmented Markup Language) for defining and validating structured outputs from LLMs, providing a domain-specific language for specifying tool schemas, output formats, and validation rules. BAML integration enables type-safe tool definitions and structured output validation without requiring manual JSON Schema definition. Uses BAML's parsing and validation capabilities to ensure LLM outputs conform to expected schemas.
Integrates BAML as a first-class schema definition language for 12-Factor agents, providing a more readable alternative to JSON Schema with type-safe code generation, rather than requiring manual JSON Schema definition
More readable and maintainable than JSON Schema because BAML uses a domain-specific language designed for structured outputs, reducing schema definition complexity by 40-50% while maintaining type safety
thread-and-event-management-system
Medium confidenceManages agent execution threads and events, tracking the sequence of agent actions, tool calls, and state transitions in a structured event log. Provides mechanisms to query execution history, replay events, and correlate events across multiple agents or execution threads. Uses an event sourcing pattern where every significant action is recorded as an immutable event.
Implements event sourcing as a first-class concern for agent execution, recording every action as an immutable event and enabling replay and correlation across threads, rather than relying on logs or state snapshots alone
Provides better auditability and debuggability than traditional logging because every action is recorded as a structured event that can be replayed and correlated, enabling perfect reconstruction of agent execution
context-window-aware-memory-management
Medium confidenceImplements intelligent context window budgeting and management to prevent token overflow and ensure critical information remains available to the LLM throughout agent execution. Implements Factor 3 by providing explicit control over what gets included in the context window, with strategies for prioritizing recent events, important facts, and error information while dropping less critical context when space is constrained. Uses a sliding window or priority-based eviction strategy rather than naive context truncation.
Implements explicit, configurable context window budgeting with priority-based eviction rather than naive truncation, ensuring critical information (recent events, errors, system state) is preserved while less important context is dropped when space is constrained
More reliable than simple context truncation because it preserves semantically important information (errors, recent decisions) even when overall context is reduced, improving agent decision quality in token-constrained scenarios by 40-60%
structured-output-tool-definition-framework
Medium confidenceDefines tools as structured output schemas rather than arbitrary functions, ensuring every tool has a well-defined input/output contract that both the LLM and deterministic code can understand. Implements Factor 4 by treating tool definitions as data structures with explicit type information, validation rules, and documentation, enabling the system to generate tool descriptions for the LLM and validate tool responses before execution. Tools are defined declaratively (likely via JSON Schema or similar) rather than as imperative function signatures.
Treats tools as declarative data structures with explicit schemas rather than imperative functions, enabling automatic validation, documentation generation, and type-safe tool invocation across LLM and deterministic code boundaries
More maintainable than function-based tool definitions because schema changes automatically propagate to LLM descriptions and validation logic, reducing inconsistencies between tool documentation and actual behavior
unified-execution-and-business-state-management
Medium confidenceUnifies the agent's execution state (what step it's on, what tools it's called, what errors occurred) with the application's business state (user data, domain objects, transaction state) into a single, consistent state representation. Implements Factor 5 by ensuring the agent's internal reasoning state and the application's persistent state are always synchronized, preventing divergence where the agent thinks it succeeded but the business operation failed. Uses a single state reducer pattern where each agent action produces a new state snapshot.
Implements a single unified state model where agent execution state and business state are merged into one immutable snapshot, using a reducer pattern to ensure every action produces a consistent state transition rather than maintaining separate agent and business state
More reliable than dual-state systems because it eliminates the possibility of agent state and business state diverging, enabling perfect replay and audit trails at the cost of increased state management complexity
agent-lifecycle-control-with-pause-resume
Medium confidenceProvides simple APIs to launch, pause, resume, and cancel agent execution without losing state or context, enabling long-running agents to be interrupted for human review or system maintenance. Implements Factor 6 by treating agent execution as a pausable process with explicit state checkpoints, allowing the agent to be suspended mid-execution, inspected, and resumed from the exact point of interruption. Uses a state-machine approach where pause/resume transitions are explicit and validated.
Implements explicit pause/resume semantics as first-class operations in the agent lifecycle, with state checkpoints that allow interruption and resumption without losing progress, rather than treating agent execution as an atomic, non-interruptible process
Enables human-in-the-loop workflows more naturally than systems without pause/resume, allowing humans to review agent decisions before critical actions without requiring complex workarounds or state management
human-contact-via-tool-calls
Medium confidenceEnables agents to contact humans (request approval, ask clarifying questions, escalate decisions) by treating human contact as a tool call rather than a special case, maintaining consistency with the tool-call abstraction. Implements Factor 7 by allowing the LLM to decide when human input is needed and what information to request, with the human response being treated as a tool result that feeds back into the agent's reasoning loop. Human contact is asynchronous and non-blocking, allowing the agent to be paused while awaiting human response.
Treats human contact as a regular tool call within the agent's decision-making loop rather than a special case, allowing the LLM to decide when and how to contact humans while maintaining consistency with the tool-call abstraction
More flexible than hard-coded approval workflows because the agent can dynamically decide when human input is needed based on reasoning, rather than requiring static rules about which actions require approval
explicit-control-flow-ownership
Medium confidenceProvides explicit, deterministic control flow logic that the agent cannot override, ensuring critical business logic and safety constraints are enforced regardless of LLM reasoning. Implements Factor 8 by separating the agent's decision-making (what tool to call) from the application's control flow (whether that tool call is allowed, what happens next), with the control flow implemented in deterministic code rather than left to the LLM. Uses a state machine or workflow engine to define valid state transitions and enforce business rules.
Implements control flow as explicit deterministic code that validates and constrains agent decisions rather than trusting the LLM to follow implicit rules, ensuring business logic and safety constraints are enforced regardless of agent reasoning
More reliable than prompt-based control flow because it uses code-level enforcement rather than relying on the LLM to follow instructions, preventing agents from bypassing constraints through creative reasoning
compact-error-representation-for-context-window
Medium confidenceCompacts error information into a concise, context-window-efficient format that provides the agent with actionable debugging information without consuming excessive tokens. Implements Factor 9 by extracting the most relevant error details (error type, root cause, suggested remediation) and presenting them in a structured format that the LLM can efficiently process, rather than including full stack traces or verbose error messages. Uses error categorization and templating to ensure consistency.
Implements error compaction as a first-class concern, extracting and structuring error information to be context-window-efficient while remaining actionable for the agent, rather than including full error details that consume excessive tokens
More token-efficient than including full error messages because it extracts only actionable information, reducing context window usage by 60-80% while maintaining agent ability to recover from errors
micro-agent-decomposition-and-composition
Medium confidenceDecomposes complex agent tasks into small, focused agents with single responsibilities that can be composed together, rather than building monolithic agents that handle multiple concerns. Implements Factor 10 by establishing clear boundaries between agent responsibilities, with each agent handling a specific domain or task type, and providing composition mechanisms to orchestrate multiple agents. Uses a DAG (directed acyclic graph) or similar pattern to define agent dependencies and execution order.
Implements agent decomposition as a first-class architectural pattern, with explicit composition mechanisms and DAG-based orchestration, rather than building monolithic agents that handle multiple concerns
More maintainable and testable than monolithic agents because each micro-agent has a single responsibility and can be tested/iterated independently, improving overall system reliability by 30-50% through reduced cognitive load
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with 12-factor-agents, ranked by overlap. Discovered automatically through the match graph.
Cohere: Command R7B (12-2024)
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning...
OpenAI: GPT-5.2
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...
Guidance
Microsoft's language for efficient LLM control flow.
Anthropic: Claude Sonnet 4.6
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...
Qwen2.5-1.5B-Instruct
text-generation model by undefined. 1,05,91,422 downloads.
Spring AI
AI framework for Spring/Java — portable LLM API, RAG pipeline, vector stores, function calling.
Best For
- ✓teams building production LLM agents that need strict tool call validation
- ✓developers migrating from free-form prompt-based tool calling to schema-driven approaches
- ✓organizations requiring audit trails of agent decisions vs actual system actions
- ✓product teams managing multiple agent variants with different prompting strategies
- ✓organizations with separate prompt engineering and software engineering roles
- ✓teams needing audit trails of prompt changes and their correlation with agent behavior changes
- ✓systems with multiple trigger sources that need to invoke the same agent
- ✓teams building event-driven architectures with agents
Known Limitations
- ⚠Requires explicit schema definition for every tool, adding upfront design overhead
- ⚠Schema validation adds latency per tool call (~50-200ms depending on complexity)
- ⚠Does not handle tools with highly dynamic or user-defined signatures without schema regeneration
- ⚠Requires external prompt storage or configuration management system (not built-in)
- ⚠No built-in analytics for measuring prompt effectiveness — requires integration with observability tools
- ⚠Prompt versioning adds complexity to agent state management if not carefully designed
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Sep 21, 2025
About
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
Categories
Alternatives to 12-factor-agents
Are you the builder of 12-factor-agents?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →