Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.
Year: 2023
Website: https://stability.ai/news/introducing-stable-audio-open
Input types: Audio Text
Output types: Audio
Output length: 47s
AI Technique: Diffusion
Dataset: freesound.org, freemusicarchive.org
License type:
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
The Stable Audio Open model weights are available on Hugging Face. The Hugging Face page provides instructions and example Python scripts demonstrating how to use the model locally.
You can also use the readily-available Jupyter notebooks and immediately try the model in Google Colab or Kaggle.
Here is a comprehensive video tutorial on fine-tuning the model, contributed by an active community member lyraaaa: Finetuning Stable Audio Open on YouTube.
lyraaaa's Jupyter notebook for fine-tuning the model is available on Google Drive and can either be dowloaded for running locally or opened directly in Google Collab.
There is a lively community of practitioners using the model, communicating on a dedicated Discord server. The authors of the model who work for Stability.ai are active there as well and frequently join discussions and answer questions.