sense-think-act agent loop with llm-agnostic multi-backend support
Implements a core agent_runner_loop that orchestrates the sense-think-act cycle by accepting LLM responses, parsing tool calls from multiple backend protocols (OpenAI, Anthropic, Gemini), executing atomic tools, and feeding results back to the LLM in a closed feedback loop. The architecture abstracts backend differences through a unified LLM Communication Layer that normalizes function-calling schemas across providers, enabling seamless switching between Claude, GPT, and Gemini without code changes.
Unique: Abstracts LLM provider differences through a unified Communication Layer that normalizes function-calling schemas (OpenAI format, Anthropic format, Gemini format) into a single internal representation, allowing the agent_runner_loop to remain completely provider-agnostic while supporting real-time backend switching
vs alternatives: Unlike LangChain or AutoGen which require separate agent implementations per provider, GenericAgent's normalized protocol layer enables true provider interchangeability with zero code duplication in the core loop logic
hierarchical memory system with axiom-based governance and long-term crystallization
Implements a multi-layer memory architecture consisting of working memory (update_working_checkpoint), episodic memory (task execution logs), and long-term memory (crystallized procedures and learned SOPs). The system uses Core Axioms as governance rules that define how the agent thinks and operates, and triggers background memory refinement via start_long_term_update which distills repeated task patterns into reusable procedures. Memory operations are synchronized across layers to maintain consistency and prevent conflicting knowledge states.
Unique: Combines working memory checkpoints with axiom-based governance and asynchronous long-term crystallization, allowing the agent to maintain consistent reasoning principles while autonomously distilling repeated task patterns into reusable procedures without explicit training loops
vs alternatives: Unlike RAG systems that retrieve static knowledge, GenericAgent's memory actively evolves through crystallization; unlike traditional RL agents that require reward signals, it learns from task execution logs and axiom compliance, making it suitable for open-ended autonomous work
human-in-the-loop confirmation with ask_user tool and interactive decision gates
The ask_user tool enables the agent to request human confirmation before executing irreversible or high-risk actions, implementing interactive decision gates in the agent's workflow. The tool blocks the agent loop until a human responds, allowing humans to inspect the agent's reasoning, provide corrections, or approve/reject proposed actions. This enables safe autonomous operation in domains where human oversight is required.
Unique: Implements interactive decision gates that block the agent loop until human confirmation, enabling safe autonomous operation in high-stakes domains while maintaining human oversight and control
vs alternatives: More flexible than static guardrails — allows humans to make contextual decisions about specific actions rather than enforcing blanket restrictions, enabling nuanced risk management
error handling and retry logic with provider-specific fallback strategies
Implements robust error handling and retry logic that gracefully handles LLM API failures, tool execution errors, and network timeouts. The system uses provider-specific fallback strategies (e.g., exponential backoff for rate limits, circuit breakers for service outages) and can automatically retry failed operations with modified parameters. Error states are logged and fed back into the memory system for learning.
Unique: Implements provider-specific error handling and retry strategies that account for different LLM API semantics (OpenAI rate limits vs. Anthropic vs. Gemini), rather than using generic retry logic
vs alternatives: More sophisticated than simple exponential backoff — uses provider-specific knowledge to make intelligent retry decisions and avoid cascading failures
atomic tool execution with code runtime manufacturing and os-level control
Provides 9 atomic tools (code_run, file_read, file_write, file_patch, web_scan, web_execute_js, ask_user, update_working_checkpoint, start_long_term_update) that give the LLM direct physical-level control over the computing system. The code_run tool executes Python or PowerShell code in isolated runtime contexts, enabling the agent to manufacture new tools dynamically, interact with the OS, and perform arbitrary computation. Each tool is designed as a minimal, composable primitive that can be combined to achieve complex tasks.
Unique: Implements a minimal set of 9 truly atomic tools (not convenience wrappers) that can be composed to manufacture new tools at runtime via code_run, enabling the agent to bootstrap its own capabilities without pre-defining every possible action
vs alternatives: Unlike tool-heavy frameworks (AutoGen, LangChain) that ship with 50+ pre-built tools, GenericAgent's atomic approach keeps the core footprint to 3K lines while enabling infinite tool creation through code_run composition
token-optimized html extraction and dom perception with pagination
The web_scan tool extracts and tokenizes HTML content from web pages using intelligent pagination and token budgeting to minimize context window consumption. The system analyzes page structure, identifies relevant content regions, and returns compressed HTML representations that preserve semantic meaning while reducing token count by orders of magnitude. This enables the agent to perceive large web pages without exhausting the LLM's context window.
Unique: Implements token-aware HTML extraction that actively minimizes LLM context consumption through intelligent pagination and content prioritization, rather than naively sending full HTML dumps like most web automation tools
vs alternatives: Achieves 6x token reduction vs. raw HTML transmission (per project claims) by combining structural analysis, content prioritization, and pagination — enabling agents to browse complex websites within tight context budgets
browser dom manipulation via javascript injection with state synchronization
The web_execute_js tool injects and executes arbitrary JavaScript code in the browser's DOM context, enabling the agent to click elements, fill forms, scroll pages, and manipulate application state. The tool maintains synchronization between the agent's mental model of page state and the actual DOM state, returning execution results and updated page snapshots after each operation. This enables complex multi-step browser automation workflows.
Unique: Combines JavaScript injection with state synchronization snapshots, allowing the agent to maintain a consistent mental model of page state across multiple DOM manipulations without requiring explicit polling or wait conditions
vs alternatives: More direct than Selenium's element-based API — allows agents to execute complex JavaScript workflows in a single tool call, reducing round-trips and enabling sophisticated SPA automation
surgical file patching with line-based diffing and atomic writes
The file_patch tool enables precise, surgical modifications to existing files using line-based diffing. Rather than rewriting entire files, it identifies the exact lines to modify, applies changes atomically, and validates the result. This approach minimizes token consumption (only changed lines are transmitted) and reduces the risk of corrupting files through accidental overwrites. The tool supports multi-line edits and preserves file formatting.
Unique: Uses line-based diffing with atomic writes to enable surgical file modifications that preserve formatting and minimize token transmission, rather than requiring full file rewrites like naive code generation approaches
vs alternatives: More efficient than file_write for large files and more precise than full-file regeneration; enables agents to make targeted edits without risking corruption of unrelated code sections
+4 more capabilities