push-to-talk voice dictation with native keyboard interception
Captures audio input via Fn key (hold-to-record) or double-tap (hands-free toggle) using a native C++ module (fn_key_monitor.node) that hooks into macOS keyboard events at the system level, bypassing Electron's renderer process limitations. The native module runs in the main process and communicates via IPC to trigger audio recording without application focus requirements, enabling dictation in any macOS application.
Unique: Uses native C++ module (fn_key_monitor.node) compiled with node-gyp to hook macOS keyboard events at the system level, enabling global Fn key capture that works across all applications without requiring app focus — unlike Electron's built-in globalShortcut which only works when app is active. Implements dual-mode interaction: single hold-to-record and double-tap hands-free toggle, both handled in native code before IPC marshaling.
vs alternatives: More reliable than Whisper Flow's browser-based approach because it operates at the OS kernel level via native modules rather than relying on browser APIs, and supports global hotkeys without requiring the Electron window to be focused.
dual-path transcription with local whisper or cloud deepgram
Implements a pluggable transcription architecture that routes audio to either local Whisper models (tiny/base/small via whisper-node-addon) for offline processing or cloud Deepgram API for high-speed transcription. The system abstracts transcription provider selection through a configuration layer, allowing users to toggle between privacy-first local processing and speed-optimized cloud processing without code changes. Audio is buffered in the renderer process and sent to the main process via IPC, which routes to the selected provider.
Unique: Implements a dual-path architecture with runtime provider selection rather than compile-time choice — users can toggle between local Whisper and Deepgram via settings without rebuilding. Uses whisper-node-addon (native C++ binding to OpenAI Whisper) for local processing and Deepgram REST API for cloud path, with unified IPC interface in main process that abstracts provider differences. Configuration persisted in electron-store allows seamless switching.
vs alternatives: More flexible than Whisper Flow (cloud-only) or Talon Voice (local-only) because it offers both paths with runtime selection, and more privacy-preserving than commercial dictation tools (Dragon, Otter) by supporting fully offline local transcription as default.
zero-telemetry privacy model with no analytics collection
Implements a privacy-first architecture with zero telemetry — no analytics libraries, no tracking pixels, no data collection beyond what's necessary for core functionality. The app does not send usage data, crash reports, or user behavior analytics to any external service. All processing (transcription, LLM post-processing) can be done locally without cloud connectivity, and cloud processing (Deepgram, LLM APIs) only sends audio/text when explicitly configured by the user.
Unique: Explicitly excludes all analytics and telemetry libraries from package.json and implements no tracking code — privacy is enforced by architecture rather than configuration. Supports fully offline processing (local Whisper + Ollama) as the default path, with cloud processing as an optional user-selected feature. No crash reporting, no error tracking, no usage analytics — complete transparency about data flow.
vs alternatives: More privacy-preserving than commercial tools (Otter, Fireflies, Whisper Flow) which collect usage analytics and store transcripts on their servers. More transparent than tools claiming privacy but using third-party SDKs for crash reporting or analytics.
ios beta support with testflight distribution
Extends Jarvis to iOS via a beta version distributed through Apple TestFlight, enabling voice dictation on iPhone and iPad. The iOS implementation (ios/README.md) uses native iOS APIs for audio capture and keyboard integration, with the same dual-path architecture (local Whisper or cloud Deepgram) as the macOS version. TestFlight allows beta testing with up to 10,000 external testers before App Store release.
Unique: Extends the macOS dual-path architecture to iOS using native Swift/Objective-C APIs for audio capture and keyboard integration. Uses TestFlight for beta distribution, allowing community feedback before App Store release. Maintains feature parity with macOS version (local Whisper + Ollama, cloud Deepgram + LLM APIs) while adapting UI and interaction patterns for iOS.
vs alternatives: More privacy-preserving than commercial iOS dictation apps (Otter, Fireflies) because it supports local-only processing. More feature-complete than iOS's built-in dictation because it adds grammar correction and filler removal via LLM post-processing.
ai-powered post-processing with filler removal and grammar correction
Chains transcribed text through an LLM-based post-processing pipeline that removes filler words ('um', 'like', 'uh'), corrects grammar, adds punctuation, and enhances readability. The system supports dual-path LLM routing: local Ollama server (models: sam860/LFM2:1.2b, llama3, mistral) for offline processing or cloud LLMs (Gemini, Claude, OpenAI) for higher quality. Post-processing is triggered automatically after transcription completes, with results cached to avoid re-processing identical transcripts.
Unique: Implements a dual-path LLM chain with provider abstraction — routes transcribed text to either local Ollama server or cloud LLM APIs (Gemini/Claude/OpenAI) via a unified interface. Uses prompt engineering to instruct LLM to remove fillers, fix grammar, and add punctuation in a single pass. Caches results keyed by transcript hash to avoid re-processing identical inputs, reducing latency and API costs on repeated dictation.
vs alternatives: More comprehensive than Whisper Flow's basic punctuation (which only adds periods) because it combines filler removal, grammar correction, and punctuation in an LLM-driven pipeline. More privacy-preserving than commercial tools (Otter, Fireflies) by supporting fully local Ollama processing, and more cost-effective than cloud-only solutions by offering local fallback.
ipc-based main-renderer process communication with security sandboxing
Implements a secure inter-process communication (IPC) bridge between Electron's main process (native module access, file I/O, API calls) and renderer process (UI, user interactions) using ipcMain and ipcRenderer with preload script isolation. The preload script (src/preload.ts) exposes a whitelist of safe IPC channels (e.g., 'start-recording', 'transcribe-audio', 'update-settings') to the renderer, preventing direct access to Node.js APIs and enforcing context isolation. Audio buffers and settings are marshaled through IPC as serialized JSON or binary data.
Unique: Uses Electron's preload script (src/preload.ts) with context isolation enabled to expose a whitelist of safe IPC channels to the renderer, preventing direct Node.js API access while maintaining full main process capabilities. Implements channel-based message routing in main.ts that dispatches IPC calls to appropriate handlers (native modules, API clients, file I/O), with error handling and response marshaling. Audio buffers are passed as binary data through IPC using Electron's native serialization.
vs alternatives: More secure than older Electron patterns (nodeIntegration: true) because it enforces process isolation and API whitelisting, preventing renderer process compromise from accessing file system or native modules. More maintainable than custom socket-based IPC because it uses Electron's built-in IPC with automatic serialization.
settings persistence with electron-store and onboarding flow
Persists user configuration (transcription provider, LLM choice, API keys, keyboard shortcuts) to disk using electron-store, a lightweight JSON-based key-value store that encrypts sensitive data (API keys) at rest. The onboarding interface (Onboarding Interface component) guides first-time users through provider selection (local vs cloud), API key configuration, and keyboard shortcut customization. Settings are loaded on app startup and cached in memory; changes trigger IPC updates to all processes and persist immediately to disk.
Unique: Uses electron-store for lightweight JSON-based persistence with optional encryption for sensitive data (API keys), avoiding the complexity of SQLite or external databases. Onboarding flow (Onboarding Interface component) is built as a separate Electron window that guides users through provider selection and API key configuration before main app launch. Settings changes trigger IPC broadcasts to all processes, ensuring UI and main process stay in sync without manual refresh.
vs alternatives: Simpler than Whisper Flow's cloud-based settings sync because it uses local-only persistence, and more user-friendly than manual config file editing because it provides a guided onboarding UI. Supports both local and cloud provider configuration in a single settings schema, unlike single-path tools.
native audio capture with system microphone integration
Captures audio from the system microphone using Web Audio API (in renderer process) or native audio APIs (via native modules in main process), with automatic gain control and noise suppression. Audio is buffered in memory as PCM samples at 16kHz sample rate, then sent to the transcription pipeline via IPC. The system handles microphone permission requests (macOS Privacy & Security) and gracefully degrades if microphone is unavailable or denied.
Unique: Uses Web Audio API in renderer process for cross-platform compatibility but can fall back to native audio modules in main process for lower latency and better control. Buffers audio at 16kHz (standard for speech recognition) and implements basic automatic gain control to normalize microphone input levels. Handles macOS microphone permission requests gracefully with user-friendly error messages.
vs alternatives: More integrated than browser-based Whisper Flow because it captures audio at the system level via Electron, avoiding browser tab audio limitations. More flexible than command-line tools (ffmpeg) because it provides real-time audio buffering and automatic format conversion.
+4 more capabilities