unified multi-provider llm client abstraction
Abstracts 20+ LLM providers (OpenAI, Anthropic, Claude, Gemini, Ollama, etc.) behind a single Client trait with unified request/response handling. Uses a provider registry pattern loaded from models.yaml that maps provider identifiers to concrete client implementations, enabling seamless provider switching without code changes. Token counting and model selection are handled uniformly across all providers through a centralized model registry system.
Unique: Uses a declarative models.yaml registry combined with a unified Client trait to support 20+ providers without conditional logic in core code. Token management and model selection are centralized rather than scattered across provider implementations, enabling consistent behavior across all providers.
vs alternatives: More flexible than LangChain's provider abstraction because configuration is declarative and providers can be swapped at runtime without recompilation; simpler than building custom provider wrappers for each tool.
interactive repl mode with stateful conversation sessions
Provides an interactive shell interface (REPL) that maintains conversation state across multiple turns, with support for role-based context switching and session persistence. The REPL mode loads configuration from GlobalConfig (wrapped in Arc<RwLock<Config>>), manages message history in memory, and supports commands for switching roles, models, and sessions. Sessions can be saved to disk and resumed later, preserving the full conversation context.
Unique: Combines role-based context switching with persistent session management, allowing users to maintain multiple independent conversation threads and switch between them without losing history. The Arc<RwLock<Config>> pattern enables thread-safe configuration updates during REPL execution.
vs alternatives: More stateful than ChatGPT CLI because it supports persistent sessions and role switching; simpler than building a custom conversation manager because session persistence is built-in.
configuration system with yaml-based model and role definitions
Manages application configuration through YAML files (models.yaml, config.yaml) that define available LLM providers, models, roles, agents, and tools. Configuration is loaded at startup and wrapped in Arc<RwLock<Config>> for thread-safe access across async tasks. The system supports configuration merging from multiple sources (system defaults, user config, environment variables) with clear precedence rules.
Unique: Uses Arc<RwLock<Config>> pattern for thread-safe configuration access across async tasks, enabling configuration updates without stopping the application. Configuration merging from multiple sources (files, environment, CLI) provides flexibility for different deployment scenarios.
vs alternatives: More flexible than hardcoded configuration because it's declarative; more thread-safe than global mutable state because it uses Arc<RwLock<>>; more portable than environment-only configuration because it supports YAML files.
token counting and context window management
Implements token counting for different models to ensure prompts fit within context windows. The system uses model-specific tokenizers (or approximations) to count tokens in messages, truncates long inputs to fit within limits, and provides warnings when approaching context limits. Token counting is integrated into the message building pipeline, ensuring all inputs are validated before sending to the LLM.
Unique: Integrates token counting into the message building pipeline before sending to the LLM, preventing context window errors. Uses model-specific tokenizers when available, falling back to approximations for consistency across providers.
vs alternatives: More proactive than waiting for provider errors because it validates before sending; more accurate than character-based truncation because it uses token counts.
macro system for command substitution and templating
Provides a macro system that enables text substitution and templating within prompts and configuration. Macros can reference environment variables, configuration values, or built-in functions (e.g., {{date}}, {{user}}, {{env:VAR_NAME}}). Macros are expanded at runtime before sending prompts to the LLM, enabling dynamic context injection without manual editing.
Unique: Provides a simple but powerful macro system that expands at runtime, enabling dynamic context injection without requiring code changes. Built-in macros ({{date}}, {{user}}, {{env:VAR}}) cover common use cases.
vs alternatives: Simpler than Jinja2 templating because it uses simple {{key}} syntax; more flexible than hardcoded values because it supports environment variables and built-in functions.
one-shot command mode for non-interactive llm queries
Provides a CMD mode for single-turn LLM interactions where a prompt is passed as a command-line argument, the LLM generates a response, and the process exits. This mode is optimized for scripting and piping, with minimal overhead and no interactive state management. CMD mode uses the same underlying LLM client and configuration system as REPL mode, ensuring consistent behavior.
Unique: Optimized for scripting and piping with minimal overhead — no interactive state management or session persistence. Uses the same Client trait as REPL mode, ensuring consistent LLM behavior across execution modes.
vs alternatives: Faster than starting a REPL session because there's no interactive overhead; more flexible than curl-based API calls because it supports multiple providers and input types.
role-based conversation context with dynamic instructions
Implements a role system where each role encapsulates a set of system instructions, model preferences, and conversation parameters. Roles are defined in configuration files and can be dynamically selected at runtime. The system supports variable substitution within role instructions (e.g., {{date}}, {{user}}) through a dynamic instructions system, enabling context-aware prompting without manual editing.
Unique: Combines role definitions with dynamic variable substitution ({{date}}, {{user}}, etc.) to create context-aware system prompts that adapt to runtime conditions. Roles are composable and can be switched mid-conversation without losing message history.
vs alternatives: More flexible than static system prompts because variables are substituted at runtime; simpler than building custom prompt management because role switching is built into the CLI.
hybrid rag system with document ingestion and semantic search
Implements a Retrieval-Augmented Generation (RAG) system that ingests documents through a multi-format pipeline (text, PDF, markdown, URLs), chunks them using configurable strategies, and stores embeddings in a local vector database. The hybrid search system combines keyword-based BM25 search with semantic vector similarity search to retrieve relevant documents. Retrieved documents are automatically injected into the LLM context before generating responses.
Unique: Combines BM25 keyword search with semantic vector similarity in a single hybrid search pipeline, avoiding the need for external vector databases. Document chunking and embedding are handled locally, enabling offline RAG without cloud dependencies.
vs alternatives: Simpler than Pinecone/Weaviate because it's self-contained; more accurate than keyword-only search because it combines BM25 with semantic similarity; faster than cloud-based RAG because embeddings are computed locally.
+6 more capabilities