AI Music Generation - Model Explorer

Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.

Year: 2023

Website: https://stability.ai/news/introducing-stable-audio-open

Input types: Audio Text

Output types: Audio

Output length: 47s

AI Technique: Diffusion

Dataset: freesound.org, freemusicarchive.org

License type:

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Guide to using the model

Running the model

The Stable Audio Open model weights are available on Hugging Face. The Hugging Face page provides instructions and example Python scripts demonstrating how to use the model locally.

You can also use the readily-available Jupyter notebooks and immediately try the model in Google Colab or Kaggle.

Fine-tuning Stable Audio Open

Here is a comprehensive video tutorial on fine-tuning the model, contributed by an active community member lyraaaa: Finetuning Stable Audio Open on YouTube.

lyraaaa's Jupyter notebook for fine-tuning the model is available on Google Drive and can either be dowloaded for running locally or opened directly in Google Collab.

Community

There is a lively community of practitioners using the model, communicating on a dedicated Discord server. The authors of the model who work for Stability.ai are active there as well and frequently join discussions and answer questions.

Running Stable Audio Open in MaxMSP/PureData

This is an option to export the autoencoder in Stable Audio Open 1.0 for realtime continuous inference in MaxMSP/PureData: Streaming Stable Audio Open.