via “real-time audio conversation with streaming speech recognition and synthesis”
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, speech synthesis and recognition, web search, memory, presets, assistants,and more. Linux, Windows, Mac
Unique: Implements full-duplex audio streaming with concurrent transcription, LLM inference, and synthesis using OpenAI's Realtime API or Google Speech services; manages audio I/O asynchronously to prevent UI blocking and enable low-latency voice interaction.
vs others: Compared to ChatGPT's voice mode (cloud-only, limited customization), py-gpt provides a local desktop audio interface with provider flexibility; compared to voice assistants (Siri, Alexa), py-gpt offers LLM-powered reasoning with full conversation history.