Smart glasses that tell me when to stop pouring
RepositoryFreeI've been experimenting with a more proactive AI interface for the physical world.This project is a drink-making assistant for smart glasses. It looks at the ingredients, selects a recipe, shows the steps, and guides me in real time based on what it sees. The behavior I wanted most was simple:
- Best for
- real-time video stream processing from smart glasses, real-time object detection and visual reasoning via openai vision api, low-latency audio feedback synthesis and playback on smart glasses
- Type
- Repository · Free
- Score
- 30/100
- Best alternative
- Browser Use
Capabilities6 decomposed
real-time video stream processing from smart glasses
Medium confidenceCaptures continuous video feed from Rokid smart glasses hardware via native device APIs and streams frames to processing pipeline at 30fps. Uses hardware-accelerated video encoding to minimize latency between capture and analysis, enabling sub-100ms feedback loops for real-time visual tasks like pour detection.
Direct integration with Rokid smart glasses hardware APIs for native video capture, bypassing generic USB/HDMI capture methods that add latency and reduce frame quality. Implements hardware-level frame synchronization to ensure consistent timestamps across video and sensor data.
Achieves lower latency than generic webcam capture libraries (OpenCV, ffmpeg) because it uses native Rokid device APIs rather than OS-level video abstractions, reducing frame buffering overhead by ~30-50ms
real-time object detection and visual reasoning via openai vision api
Medium confidenceSends captured video frames to OpenAI's real-time API for multimodal analysis, using GPT-4V or similar vision models to detect pouring actions, liquid levels, and container states. Implements streaming inference where frames are batched and sent asynchronously, with results returned as structured JSON predictions that trigger immediate feedback to the glasses display.
Uses OpenAI's real-time streaming API (not batch processing) to minimize latency between frame capture and inference result, with asynchronous frame submission that doesn't block the video capture pipeline. Implements frame skipping logic to handle API rate limits gracefully.
Achieves better accuracy than local YOLO/TensorFlow models for complex visual reasoning (understanding 'when to stop pouring') because GPT-4V has broader semantic understanding, though at the cost of higher latency and API dependency
low-latency audio feedback synthesis and playback on smart glasses
Medium confidenceConverts detection results (e.g., 'stop pouring') into audio cues that are synthesized and played through smart glasses speakers with <200ms end-to-end latency. Uses text-to-speech synthesis (likely OpenAI TTS or similar) combined with audio buffering to ensure immediate auditory feedback without blocking the vision processing pipeline.
Implements asynchronous TTS synthesis that doesn't block the main vision processing loop, with audio queuing to handle rapid successive alerts. Pre-caches common phrases ('stop pouring', 'full') to reduce latency for frequent scenarios.
Faster than generating audio on-demand for every detection because it pre-synthesizes common alerts and uses a priority queue, achieving <150ms feedback latency vs 300-500ms for naive TTS approaches
real-time ar display overlay rendering on smart glasses
Medium confidenceRenders visual annotations (e.g., 'STOP' indicator, liquid level gauge, confidence scores) directly onto the smart glasses display using native Rokid rendering APIs. Implements frame-synchronized overlay composition where detection results are mapped to screen coordinates and rendered at the glasses' native refresh rate (typically 60Hz) without tearing or latency.
Synchronizes overlay rendering with video capture frame rate using hardware-level vsync, ensuring overlays appear exactly where the user is looking without temporal misalignment. Uses Rokid's native rendering pipeline rather than generic graphics libraries.
Achieves lower latency than software-based overlay composition (OpenCV, PIL) because it uses GPU-accelerated rendering on the glasses' native hardware, reducing overlay-to-display latency from 50-100ms to <16ms
end-to-end latency optimization and frame synchronization
Medium confidenceOrchestrates the entire pipeline (video capture → inference → feedback) with explicit latency budgeting and frame synchronization. Implements timestamp tracking across all stages, adaptive frame skipping when inference falls behind, and priority queuing to ensure critical alerts (e.g., 'stop pouring') are never delayed. Uses a state machine to coordinate async operations without blocking.
Implements explicit latency budgeting where each pipeline stage has a maximum allowed latency; if a stage exceeds its budget, subsequent frames are skipped to prevent cascading delays. Uses a priority queue to ensure critical alerts bypass frame skipping.
Achieves more predictable latency than naive sequential processing because it uses adaptive frame skipping and priority queuing, ensuring worst-case latency stays under 500ms even when inference is slow, vs 1-2 second delays in naive approaches
context-aware pouring detection with liquid level estimation
Medium confidenceCombines object detection (identifying containers, liquids, pouring action) with semantic reasoning to estimate liquid level and predict when the container will be full. Uses vision model analysis to track liquid surface position across frames, applies geometric reasoning to estimate volume, and triggers 'stop pouring' alerts based on configurable thresholds. Handles multiple container types (cups, glasses, bottles) with adaptive detection logic.
Uses multi-frame temporal analysis to track liquid surface movement and estimate volume change rate, rather than single-frame level detection. Combines vision model semantic understanding ('this is a cup being poured') with geometric reasoning to predict overflow before it occurs.
More accurate than simple threshold-based detection (e.g., 'alert when container is 80% full') because it predicts overflow based on pouring rate and container capacity, giving users 1-2 seconds warning before overflow vs immediate alerts
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Smart glasses that tell me when to stop pouring, ranked by overlap. Discovered automatically through the match graph.
FacePoke_CLONE-THIS-REPO-TO-USE-IT
FacePoke_CLONE-THIS-REPO-TO-USE-IT — AI demo on HuggingFace
OpenAI: GPT Audio
The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced...
generative-ai
Sample code and notebooks for Generative AI on Google Cloud, with Gemini Enterprise Agent Platform
Chooch AI Vision
Advanced visual AI for real-time image and video...
Vercel AI SDK
TypeScript toolkit for AI web apps — streaming, tool calling, generative UI. Works with 20+ LLM providers.
py-gpt
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, speech synthesis and recognition, web search, memory, presets, assistants,and more. Linux, Windows, Mac
Best For
- ✓hardware engineers building AR/VR applications
- ✓roboticists integrating wearable sensors with AI pipelines
- ✓developers creating real-time safety or assistance applications for smart glasses
- ✓developers building assistive AR applications without local GPU resources
- ✓teams prototyping vision-based safety systems that need high accuracy
- ✓builders creating real-time feedback loops that require semantic scene understanding
- ✓developers building hands-free AR assistants
- ✓accessibility-focused applications requiring audio-primary interfaces
Known Limitations
- ⚠Rokid-specific implementation — not portable to other smart glasses brands without refactoring
- ⚠Real-time processing adds 50-150ms latency depending on network conditions and model inference time
- ⚠Video stream bandwidth requires stable WiFi or 5G connection; degrades on poor connectivity
- ⚠Frame rate capped at device hardware capabilities (typically 30fps for Rokid)
- ⚠Requires API calls to OpenAI — adds 200-500ms latency per frame depending on network and model load
- ⚠OpenAI API costs scale with frame rate; continuous 30fps processing costs ~$0.50-1.00 per minute
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
About
Show HN: Smart glasses that tell me when to stop pouring
Categories
Alternatives to Smart glasses that tell me when to stop pouring
Most-starred open-source browser-agent library — agents drive real browsers via Playwright + any LLM.
Compare →Stripe's official agent SDK + MCP — payments, invoices, billing, and usage metering as agent tools.
Compare →Zapier's hosted MCP — 8,000+ app integrations exposed as allowlisted agent tools.
Compare →Atlassian's official hosted MCP — Jira + Confluence with OAuth, permission-bounded agent access.
Compare →Are you the builder of Smart glasses that tell me when to stop pouring?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →