Meta has unveiled AudioCraft, a new open-source AI code that allows users to generate music and sounds entirely through generative AI. The code is comprised of three AI models, each focusing on a different aspect of sound generation. MusicGen, one of the models, uses text inputs to create music and was trained on 20,000 hours of music owned or licensed by Meta. AudioGen generates audio from written prompts, such as barking dogs or footsteps, using public sound effects for training. An enhanced version of Meta’s EnCodec decoder reduces artifacts when creating sounds.
At present, AudioCraft’s capabilities are more suited to creating background music or stock songs for ambiance rather than chart-topping hits. However, Meta believes that its new model has the potential to revolutionise the music industry in the same way that synthesisers did when they first gained popularity. In a blog post, the company stated that it envisions MusicGen becoming a new type of instrument.
Meta acknowledges the challenges involved in developing AI models capable of creating music, as audio contains millions of data points compared to the thousands found in written text models like Llama 2. To address this issue and diversify the data used to train AudioCraft, Meta has decided to open-source the code.
The company recognises that its training datasets lack diversity and hopes that by sharing AudioCraft’s code, other researchers will be able to test new approaches to reduce bias and prevent misuse of generative models. According to Meta, all music used to train MusicGen is either owned by the company or specifically licensed for this purpose.
Check out the code library on GitHub and watch the below video to see how to install it.