joinly
MCP ServerFreeMake your meetings accessible to AI Agents
Capabilities12 decomposed
browser-based meeting platform joining with platform-specific automation
Medium confidenceEnables AI agents to join Google Meet, Zoom, and Microsoft Teams meetings through Playwright-based browser automation with platform-specific controllers that handle each platform's unique UI patterns, authentication flows, and meeting state management. The BrowserMeetingProvider abstracts platform differences while delegating to GoogleMeetController, ZoomController, and TeamsController for platform-specific interactions, managing virtual display (Xvfb) and audio device routing.
Uses modular platform-specific controllers (GoogleMeetController, ZoomController, TeamsController) that encapsulate UI interaction logic per platform, allowing independent updates without affecting other platforms. Manages virtual display and audio routing at the provider level, abstracting infrastructure complexity from agent code.
More maintainable than monolithic browser automation because platform logic is isolated in controllers; more flexible than API-only solutions because it works with any meeting platform that has a web interface
real-time audio capture and voice activity detection pipeline
Medium confidenceCaptures audio from meeting participants in real-time through PulseAudio integration and applies Voice Activity Detection (VAD) to filter silence and background noise before sending to transcription. The DefaultTranscriptionController orchestrates the VAD → STT pipeline, using pluggable VAD service providers (local or cloud-based) to reduce transcription costs by only processing segments with actual speech.
Implements pluggable VAD service architecture allowing runtime selection between local (privacy-preserving) and cloud-based VAD providers, with configurable sensitivity thresholds. Integrates directly with PulseAudio for low-level audio device control rather than relying on higher-level audio libraries.
More cost-effective than transcribing all audio because VAD pre-filters silence; more privacy-preserving than cloud-only solutions because local VAD options are available; more flexible than fixed VAD implementations because providers are swappable
client sdk with joinlyclient api for agent development
Medium confidenceProvides high-level Python SDK (joinly-client package) with JoinlyClient class that abstracts MCP communication and session management, enabling developers to build meeting agents without understanding MCP protocol details. SDK handles connection lifecycle, tool calling, and transcript streaming, providing a simple async API for agent code.
Abstracts MCP protocol complexity through a high-level JoinlyClient API, enabling developers to build agents with simple async methods (join_meeting, send_message, get_transcript) without MCP knowledge. Integrates ConversationalToolAgent for LLM-based agent logic.
More developer-friendly than raw MCP because abstractions hide protocol details; more integrated than generic MCP clients because it understands meeting-specific operations natively
shared type system and protocol definitions for cross-package consistency
Medium confidenceDefines shared data types (Transcript, AudioFormat, AudioChunk) and service provider protocols in joinly-common package, ensuring consistent interfaces across server and client packages. Protocols define expected behavior for VAD, STT, and TTS providers, enabling type-safe provider implementations and reducing integration errors.
Uses Python protocols to define service provider interfaces (VAD, STT, TTS) without requiring inheritance, enabling flexible provider implementations while maintaining type safety. Shared types (Transcript, AudioFormat) ensure consistent data representation across server and client.
More flexible than inheritance-based interfaces because protocols support structural typing; more maintainable than duplicated type definitions because shared types are defined once in joinly-common
speech-to-text transcription with pluggable provider support
Medium confidenceConverts filtered audio segments to text using configurable STT service providers (e.g., OpenAI Whisper, Google Cloud Speech, local models). The DefaultTranscriptionController receives VAD-filtered audio chunks and routes them to the selected STT provider, returning Transcript objects with text, confidence scores, and timing metadata for agent consumption.
Abstracts STT provider selection through a pluggable service architecture, allowing runtime provider switching via configuration without code changes. Maintains Transcript data type across all providers, ensuring consistent downstream agent integration regardless of STT backend.
More flexible than single-provider solutions because agents aren't locked into one STT service; more maintainable than custom provider wrappers because the framework handles provider lifecycle and error handling
text-to-speech synthesis with real-time audio output
Medium confidenceConverts agent text responses to speech and outputs audio to the meeting in real-time using configurable TTS service providers (e.g., Resemble, Google Cloud TTS, local TTS engines). The DefaultSpeechController manages the TTS → audio output pipeline, handling audio format conversion, buffering, and PulseAudio device routing to ensure agent speech is heard by meeting participants.
Implements pluggable TTS provider architecture (e.g., Resemble.ai integration in joinly/services/tts/resemble.py) with audio format conversion and PulseAudio sink management, allowing provider swapping without agent code changes. Handles real-time audio buffering and synchronization with meeting audio stream.
More flexible than single-provider TTS because voice quality and cost can be optimized per deployment; more integrated than generic TTS libraries because it handles meeting-specific audio routing and synchronization
mcp-based meeting tool exposure for llm agents
Medium confidenceExposes meeting capabilities (join, transcribe, speak, get participants, etc.) as standardized Model Context Protocol (MCP) tools that LLM agents can call. The FastMCP server interface wraps meeting operations as callable tools with JSON schemas, enabling any MCP-compatible LLM client to interact with meetings through a standard protocol without needing to understand Joinly's internal APIs.
Implements FastMCP server that wraps Joinly's meeting operations as standardized MCP tools, enabling any MCP-compatible LLM to control meetings without custom integrations. Uses Server-Sent Events for real-time updates (transcripts, participant changes) alongside request-response tool calls.
More interoperable than proprietary APIs because MCP is a standard protocol; more maintainable than custom LLM integrations because tool schemas are defined once and work across all MCP clients
session management and dependency injection for meeting orchestration
Medium confidenceManages meeting session lifecycle (creation, state tracking, resource cleanup) through the MeetingSession orchestrator class, using dependency injection to wire together platform providers, audio controllers, and service implementations. Sessions maintain state across multiple operations, handle concurrent audio processing, and ensure proper resource cleanup on meeting termination.
Uses dependency injection pattern to wire together platform providers, audio controllers, and service implementations, allowing flexible composition without tight coupling. MeetingSession acts as central orchestrator coordinating browser automation, audio processing, and transcription pipelines.
More maintainable than monolithic session handling because concerns are separated; more testable because dependencies can be mocked; more flexible because service implementations can be swapped without changing session code
conversational agent framework with llm integration
Medium confidenceProvides ConversationalToolAgent class that wraps LLM integration for building meeting agents that can understand meeting context, call MCP tools, and generate responses. The agent maintains conversation history, handles tool calling loops, and integrates with any LLM provider that supports function calling (OpenAI, Anthropic, local models via Ollama).
Abstracts LLM provider selection through a pluggable interface, supporting OpenAI, Anthropic, and local LLMs via Ollama without code changes. Handles tool calling loops and conversation history management, reducing boilerplate for agent developers.
More flexible than single-LLM solutions because any function-calling LLM can be used; more integrated than generic LLM libraries because it understands meeting context and MCP tools natively
multi-provider service abstraction with runtime configuration
Medium confidenceProvides pluggable service provider architecture for VAD, STT, and TTS, allowing runtime selection and configuration without code changes. Service providers are registered in a dependency injection container, enabling easy swapping between local and cloud implementations based on deployment environment (privacy requirements, cost, latency).
Implements service provider abstraction through Python protocols and dependency injection, allowing providers to be swapped at runtime via configuration without code changes. Supports both local (privacy-preserving) and cloud-based implementations for each service type.
More flexible than hardcoded provider implementations because providers are pluggable; more cost-effective than single-provider solutions because optimal provider can be selected per deployment; more privacy-preserving because local options are available
real-time transcript streaming with timing metadata
Medium confidenceStreams transcripts from meetings to connected clients in real-time using Server-Sent Events (SSE), including timing information (start_time, end_time) and speaker metadata. The Transcript data type (from joinly-common) standardizes transcript format across all STT providers, enabling consistent agent consumption regardless of backend.
Uses Server-Sent Events for real-time transcript streaming with standardized Transcript data type across all STT providers, ensuring consistent timing and metadata regardless of backend. Integrates with MCP protocol for seamless agent consumption.
More responsive than polling-based transcript delivery because SSE pushes updates; more standardized than provider-specific transcript formats because Transcript type is consistent across backends
docker-based deployment with virtual display and audio device management
Medium confidenceProvides Docker containerization with pre-configured Xvfb virtual display, PulseAudio daemon, and Playwright browser for headless meeting automation. Multiple Docker image variants support different deployment scenarios (minimal, full-featured, GPU-accelerated), with environment variable configuration for service providers and meeting parameters.
Provides multiple Docker image variants (minimal, full-featured, GPU) with pre-configured Xvfb and PulseAudio, abstracting complex virtual display and audio device setup. Environment variable configuration enables provider selection without rebuilding images.
More deployable than native installations because Docker handles dependency management; more flexible than single-image solutions because variants support different resource/feature trade-offs
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with joinly, ranked by overlap. Discovered automatically through the match graph.
MeetGeek
an AI meeting assistant that automatically video records, transcribes, summarizes, and provides the key points from every meeting.
Scribbl
AI Meeting Notes
Loopin AI
Loopin is a collaborative meeting workspace that not only enables you to record, transcribe & summaries meetings using AI, but also enables you to...
Looppanel
Streamline research with AI transcription, live notetaking, and analysis, facilitating seamless collaboration and data organization across...
tl;dv
AI meeting recorder with clips and CRM sync.
Fireflies.ai
AI notetaker with transcription and CRM integration.
Best For
- ✓teams building meeting-aware AI agents
- ✓developers automating meeting participation workflows
- ✓enterprises needing AI agents in standardized video platforms
- ✓cost-conscious teams using cloud STT services
- ✓developers building low-latency meeting agents
- ✓deployments with bandwidth or compute constraints
- ✓Python developers building meeting agents
- ✓teams prioritizing rapid agent development
Known Limitations
- ⚠Requires headless browser environment with virtual display (Xvfb) and audio device support
- ⚠Platform UI changes may break automation until controllers are updated
- ⚠Cannot bypass platform authentication — requires valid meeting links or credentials
- ⚠Browser automation adds 3-5 second latency for meeting join operations
- ⚠VAD accuracy varies by audio quality and background noise levels
- ⚠Local VAD adds ~50-100ms latency per audio chunk
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Mar 19, 2026
About
Make your meetings accessible to AI Agents
Categories
Alternatives to joinly
Are you the builder of joinly?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →