via “neural audio codec compression with discrete token representation”
A single-stop code base for generative audio needs, by Meta. Includes MusicGen for music and AudioGen for sounds. #opensource
Unique: Learned neural codec that produces parallel discrete token streams rather than single bitstream, enabling multi-stream autoregressive modeling; uses convolutional architecture with learned quantization to balance compression and reconstruction fidelity, treating audio compression as a learned representation problem rather than traditional signal processing
vs others: Provides better reconstruction fidelity than traditional codecs (MP3, AAC) at equivalent bitrates and enables discrete token representation for language modeling, unlike streaming codecs; however, computational cost of neural encoding/decoding is higher than traditional codecs