AI Music Generation - Model Explorer

Add model

AudioLCM

Efficient and high-quality text-to-audio generation with Latent Consistency Model.

Year: 2023

Website: https://audiolcm.github.io/

Input types: Text

Output types: Audio

Output length: Variable

AI Technique: Latent Diffusion

Dataset: "Teacher" model not disclosed, AudioCaps dataset, (Kim et al., 2019) for AudioLCM mode

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Dance Diffusion

The first in a suite of generative audio tools for producers and musicians to be released by Harmonai. The provided Jupyter notebooks allow users to perform: - Unconditional random audio sample generation - Audio sample regeneration/style transfer using a single audio file or recording - Audio interpolation between two audio files

Year: 2022

Website: https://github.com/Harmonai-org/sample-generator

Input types: Audio Text

Output types: Audio

Output length: Variable

AI Technique: Latent Diffusion

Dataset: Online sources - glitch.cool, songaday.world, MAESTRO dataset, Unlocked Recordings, xeno-canto.org

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

MusicGen

Language Model for conditional music generation developed by Meta. The output can be prompted by a text description and additionally conditioned on a melody.

Year: 2023

Website: https://ai.honu.io/papers/musicgen/

Input types: Text

Output types: Audio

Output length: 30s

AI Technique: Transformer

Dataset: NSynth Dataset; Others not disclosed

License type: MIT/CC-BY-NC

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Okio Nendo

Open source platform capable of generating music from text prompts.

Year: 2023

Website: https://okio.ai/

Input types: Audio Text

Output types: Audio

Output length: Variable

AI Technique: Suite of AI tools

Dataset: Not disclosed

License type: MIT for core tools

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free

Riffusion

Music generation from text descriptions based on stable diffusion. Can be conditioned on an image.

Year: 2022

Website: https://www.riffusion.com/

Input types: Text Image

Output types: Audio

Output length: around 3min

AI Technique: Diffusion

Dataset: Not disclosed

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Stable Audio Open

Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.

Year: 2023

Website: https://stability.ai/news/introducing-stable-audio-open

Input types: Audio Text

Output types: Audio

Output length: 47s

AI Technique: Diffusion

Dataset: freesound.org, freemusicarchive.org

License type:

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Mustango

Mustango is an open-source Text-to-Music model with focus on fine controllability allowing to specify musical attributes such as key or chord sequences.

Year: 2023

Website: https://amaai-lab.github.io/mustango/

Input types: Text

Output types: Audio

Output length: 10 sec

AI Technique: Latent Diffusion

Dataset: MusicBench

License type: MIT/CC-BY-SA

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #small-dataset #open-source #free #checkpoints

Yue AI

Yue AI is an open-source music generation model. The user can input lyrics and genre information as text, along with optional audio clips for context, with the input audio clips needing to be around 30 seconds long. Supported input languages include English, Mandarin Chinese, Cantonese, Japanese, and Korean. Yue AI will output its generated music in an audio file that is up to 5 minutes in length.

Year: 2025

Website: https://map-yue.github.io/

Input types: Audio Text

Output types: Audio

Output length: 5 minutes

AI Technique: Transformer

Dataset: WeNetSpeech, LibriHeavy, GigaSpeech, 650K hours of internet mined data

License type: Apache-2.0 license

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

DiffRhythm

DiffRhythm is an open-source music generation model. The user can input lyrics and style information as text, along with an optional audio prompt for context, with the input audio clips needing to be less than 10 seconds long. Supported languages include English and Chinese. DiffRhythm will output its generated music as an audio file (MP3, wav, or ogg) that is up to 285 seconds in length.

Year: 2025

Website: https://aslp-lab.github.io/DiffRhythm.github.io/

Input types: Audio Text

Output types: Audio

Output length: 95-285 seconds

AI Technique: Latent Diffusion

Dataset: 300,000 hours of internet scraped music, cleaned to 25,000 hours of top-quality music

License type: Apache-2.0 license

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Muzic ROC

Muzic ROC is an open-source music generation model. The user can input lyrics and a chord progression as text, and the model will output a melody. ROC is language-insensitive, so slightly editing the code can modify its output language.

Year: 2022

Website: https://github.com/microsoft/muzic/tree/main/roc

Input types: Text

Output types: MIDI

Output length: dependent on lyrics provided

AI Technique:

Dataset: LMD-matched MIDI dataset

License type: MIT License

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints