Audiocraft 2: Sound Generation
In this tutorial we'll use Audiocraft for sound generation, allowing you to pass a description of a sound you want to hear to the command line, and generating an audio file of the sound from that description.
Links
Step 1: Generate some static samples.
Create a file in the root of the repo called 'audiogen-demo.py' and paste in the following:
import torchaudio
from audiocraft.models import AudioGen
from audiocraft.data.audio import audio_write
model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5) # generate 5 seconds.
descriptions = ['dog barking', 'sirene of an emergency vehicle', 'footsteps in a corridor']
wav = model.generate(descriptions) # generates 3 samples.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
This imports the audiogen library, brings in the audiogen-medium model (which you can see details of here, at hugging face ), and generates a set of 5 second long clips based on the descriptions in the 'descriptions' array: a dog barking, a siren, and footsteps, saving them to the root of the repo as 0.wav,1.wav,and 2.wav, respectively.
You can run this script with 'python audio-gen.py'. Keep in mind, the model, which is almost 4GB, has to load first, so generation will be slow, especially if you don’t have a GPU!
Step 2: Add params
Import the argparse library at the top of the file: 'import argparse'.
Create a named function after you instantate your model:
model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5) # generate [duration] seconds.
def generate_audio(descriptions):
and move everything from the declaration of your descriptions through the audio_write loop into the new function:
def generate_audio(descriptions):
wav = model.generate(descriptions) # generates samples for all descriptions in array.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
print(f'Generated {idx}th sample.')
Set up the ability to accept arguments via the command line at the bottom of the script:
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate audio based on descriptions.")
parser.add_argument("descriptions", nargs='+', help="List of descriptions for audio generation")
args = parser.parse_args()
On the last line of the script, call the function:
generate_audio(args.descriptions)
Run it to generate audio (replace bracketed text with your desired sounds):
python audiogen-demo.py "[audio you want to generate 1]" "[audio you want to generate 2]"
Final code: (audiogen-demo.py):
import torchaudio
from audiocraft.models import AudioGen
from audiocraft.data.audio import audio_write
import argparse
model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5) # generate [duration] seconds.
def generate_audio(descriptions):
wav = model.generate(descriptions) # generates samples for all descriptions in array.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
print(f'Generated {idx}th sample.')
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate audio based on descriptions.")
parser.add_argument("descriptions", nargs='+', help="List of descriptions for audio generation")
args = parser.parse_args()
generate_audio(args.descriptions)