offline-first voice-to-intent recognition and execution
Leon processes speech input through local speech-to-text engines (supporting multiple STT backends like Sphinx, Google Cloud Speech, or Azure), converts recognized text to structured intents via a modular skill-matching system, and executes corresponding actions without requiring cloud connectivity. The architecture uses a plugin-based skill loader that maps utterances to Python/Node.js modules, enabling offline operation while maintaining privacy by keeping audio processing local.
Unique: Combines offline STT/TTS with a modular skill plugin system that executes local Python/Node.js code, avoiding cloud dependency entirely while maintaining extensibility through a standardized skill interface that developers can hook into
vs alternatives: Differs from Alexa/Google Assistant by prioritizing offline operation and code-level customization over cloud-scale NLU, making it suitable for privacy-sensitive deployments and custom automation where users control the entire execution stack
modular skill plugin system with intent routing
Leon implements a skill-based architecture where each capability is a self-contained module (Python or Node.js) that registers itself with a central intent router. Skills declare their trigger phrases, required parameters, and execution logic; the router uses fuzzy string matching or regex patterns to map user utterances to the appropriate skill, then invokes it with extracted parameters. This design enables non-developers to add new capabilities by dropping a skill file into a directory without modifying core agent code.
Unique: Uses a declarative skill manifest pattern where each module self-registers with trigger phrases and parameter schemas, combined with a hot-reload skill loader that allows adding/updating skills at runtime without restarting the agent — enabling rapid iteration and community contribution
vs alternatives: More extensible than monolithic chatbots (which require code changes for new features) but less semantically sophisticated than LLM-based agents (which use function calling); trades NLU accuracy for simplicity and offline operation
system command execution and shell integration
Leon skills can execute system commands (shell scripts, executables) through a sandboxed execution layer, enabling automation of OS-level tasks like file operations, process management, or system configuration. Skills invoke commands via a wrapper that captures output and errors, returning results to the user. This enables voice control of system administration tasks, file management, and integration with command-line tools.
Unique: Allows skills to execute arbitrary system commands through a simple wrapper, enabling voice control of OS-level operations without requiring separate APIs or integrations — suitable for power users and system administrators
vs alternatives: More powerful than API-only assistants (can control any command-line tool) but less safe than sandboxed execution; requires careful skill design to avoid security vulnerabilities
context-aware skill execution with user preferences and state
Leon maintains optional user profiles and skill state (stored in JSON files or external databases) that skills can access during execution. Skills can read user preferences (language, timezone, favorite contacts) and maintain state (reminders, task lists, conversation history) to provide personalized responses. This enables skills to adapt behavior based on user context without requiring explicit parameters in every utterance.
Unique: Provides optional user profile and state management through JSON files or external databases, enabling skills to access user context and maintain state without requiring explicit parameter passing — supporting personalized, stateful automation
vs alternatives: More flexible than stateless assistants but less sophisticated than LLM-based context management; requires manual state design by skill authors, suitable for simple personalization and task tracking
text-to-speech synthesis with multiple backend support
Leon generates spoken responses by routing text through configurable TTS backends (local engines like eSpeak, or cloud APIs like Google Cloud Text-to-Speech, Azure, or Amazon Polly). The TTS layer abstracts backend selection, allowing users to choose between offline synthesis (lower quality, no latency) and cloud synthesis (higher quality, requires API key). Audio output is streamed or buffered to system speakers, with support for multiple voices and languages depending on backend capabilities.
Unique: Provides a pluggable TTS abstraction layer that allows swapping between offline (eSpeak) and cloud (Google, Azure, Polly) backends via configuration, enabling users to optimize for latency vs. quality without code changes
vs alternatives: More flexible than single-backend solutions (e.g., Alexa locked to Amazon Polly) by supporting multiple TTS providers; trades quality for offline capability compared to cloud-only assistants
speech-to-text transcription with offline and cloud backends
Leon converts audio input to text using pluggable STT backends: offline engines (PocketSphinx, CMU Sphinx) for privacy and zero-latency operation, or cloud APIs (Google Cloud Speech-to-Text, Azure Speech Services, Deepgram) for higher accuracy. The STT layer handles audio format conversion, noise filtering, and streaming transcription, returning recognized text with optional confidence scores. Users configure their preferred backend via environment variables or config files.
Unique: Abstracts STT backend selection through a unified interface, allowing users to start with offline Sphinx for privacy and seamlessly upgrade to cloud APIs (Google, Azure, Deepgram) for accuracy without code changes — configuration-driven backend switching
vs alternatives: Offers offline-first operation unlike cloud-only solutions (Google Assistant, Alexa), but with lower accuracy than specialized speech models; enables privacy-preserving deployments at the cost of recognition quality
intent-based task automation with parameter extraction
Leon maps recognized user utterances to executable tasks by extracting parameters from text using regex patterns or simple NLU heuristics, then invoking the corresponding skill with structured parameters. For example, 'remind me to call John at 3 PM' extracts the action (remind), target (John), and time (3 PM), passing them to a reminder skill. This enables users to trigger complex workflows through natural language without explicit API calls or structured input.
Unique: Combines utterance-to-intent routing with lightweight parameter extraction using regex and pattern matching, avoiding the complexity of full NLU while remaining simple enough for developers to add new intents via skill manifests
vs alternatives: Simpler and faster than LLM-based intent classification (no API calls, no latency) but less flexible — requires explicit pattern definition for each intent variant; suitable for closed-domain automation where utterance patterns are predictable
cross-platform local agent deployment with node.js and python
Leon runs as a standalone agent on Windows, macOS, and Linux using Node.js as the core runtime, with Python support for skill execution. The agent loads skills dynamically from a skills directory, manages audio I/O through system APIs, and exposes a local HTTP API for programmatic control. Users can deploy Leon on personal computers, Raspberry Pi, or lightweight servers without cloud infrastructure, maintaining full control over data and execution.
Unique: Provides a lightweight, self-contained agent runtime that runs entirely locally using Node.js + Python, with no cloud infrastructure required — enabling true offline operation and data privacy while remaining deployable on consumer hardware
vs alternatives: More privacy-preserving and offline-capable than cloud assistants (Alexa, Google Assistant) but requires manual setup and lacks the scale/sophistication of cloud-based NLU; suitable for power users and developers, not mainstream consumers
+4 more capabilities