Capability
Singing Voice Synthesis
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “zero-shot voice cloning with minimal reference audio”
text-to-speech model by undefined. 6,61,227 downloads.
Unique: Uses flow matching (continuous normalizing flows) instead of discrete diffusion steps, reducing inference steps from 100+ to 20-30 while maintaining voice fidelity; integrates speaker embeddings via cross-attention rather than concatenation, enabling smoother voice interpolation and style transfer
vs others: Faster inference than XTTS-v2 (2-5s vs 5-10s) with comparable voice quality while requiring less reference audio than Vall-E or YourTTS