Name:
Description: Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.
Year:
Website:
Input types: Audio MIDI Text None Genre Metadata Image
Output types: Audio MIDI
Output length:
Technology: Not Specified Latent Consistency Model Latent Diffusion LSTM VAE Sequence-to-sequence neural network Transformer Suite of AI tools Diffusion Hierarchical Recurrent Neural Network (RNN) Autoregressive Convolutional Neural Network
Dataset:
License type:
Has real time inference: Yes No Not known
Is free: Yes No Yes and No, depending on the plan Not known
Is open source: Yes No Not known
Are checkpoints available: Yes No Not known
Can finetune: Yes No Not known
Can train from scratch: Yes No Not known
Tags: text-to-audio MIDI text-prompt small-dataset open-source low-resource free checkpoints proprietary no-input image-to-audio
Guide: ### Running the model The Stable Audio Open model weights are available on [Hugging Face](https://huggingface.co/stabilityai/stable-audio-open-1.0). The Hugging Face page provides instructions and example Python scripts demonstrating how to use the model locally. You can also use the readily-available Jupyter notebooks and immediately try the model in [Google Colab](https://huggingface.co/stabilityai/stable-audio-open-1.0/colab) or [Kaggle](https://huggingface.co/stabilityai/stable-audio-open-1.0/kaggle). ### Fine-tuning Stable Audio Open Here is a comprehensive video tutorial on fine-tuning the model, contributed by an active community member lyraaaa: [Finetuning Stable Audio Open on YouTube](https://www.youtube.com/watch?v=ex4OBD_lrds). lyraaaa's Jupyter notebook for fine-tuning the model is available on [Google Drive](https://drive.google.com/file/d/1EG2faHovvfU6SJyn-3tl9dKUZLASrwri/view) and can either be dowloaded for running locally or opened directly in Google Collab. ### Community There is a lively community of practitioners using the model, communicating on a dedicated [Discord server](https://discord.gg/stablediffusion). The authors of the model who work for Stability.ai are active there as well and frequently join discussions and answer questions. This field renders Markdown
Captcha: