Capability
Voice Activity Detection And Silence Trimming
18 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “voice-activity-detection-with-speech-frames”
automatic-speech-recognition model by undefined. 1,02,42,383 downloads.
Unique: Integrates VAD as a learnable component within the pyannote pipeline rather than as a separate preprocessing step, allowing joint optimization with speaker segmentation. Uses a lightweight CNN-based classifier optimized for low-latency frame-level inference (< 5ms per frame on CPU).
vs others: Achieves 95%+ F1-score on standard VAD benchmarks (TIMIT, LibriSpeech) compared to 88-92% for traditional energy-based or spectral-based VAD methods, particularly in noisy conditions.