via “autoregressive audio token generation with long-term dependency modeling”
A single-stop code base for generative audio needs, by Meta. Includes MusicGen for music and AudioGen for sounds. #opensource
Unique: Applies standard autoregressive language modeling to discrete audio tokens rather than text, using token interleaving patterns to handle parallel token streams from the neural codec; this enables leveraging decades of LM research and techniques for audio generation while maintaining the discrete, interpretable representation
vs others: Provides more explicit control and interpretability than diffusion-based audio generation, and better long-term coherence than parallel generation methods; however, slower inference than non-autoregressive approaches due to sequential token generation