AI Music Generation - Model Explorer

Audialab

A collection of AI-powered generative tools: sample generator (Deep Sampler), drum sampler specialising in drum sound generator (Emergent Drums), sample pack generator (Infinite Packs), drum sound variation generator (Humanize).

Year: 2021

Website: https://audialab.com/

Input types: Text

Output types: Audio

Output length: Short

AI Technique: Not Specified

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #proprietary

AudioLCM

Efficient and high-quality text-to-audio generation with Latent Consistency Model.

Year: 2023

Website: https://audiolcm.github.io/

Input types: Text

Output types: Audio

Output length: Variable

AI Technique: Latent Diffusion

Dataset: "Teacher" model not disclosed, AudioCaps dataset, (Kim et al., 2019) for AudioLCM mode

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Dance Diffusion

The first in a suite of generative audio tools for producers and musicians to be released by Harmonai. The provided Jupyter notebooks allow users to perform: - Unconditional random audio sample generation - Audio sample regeneration/style transfer using a single audio file or recording - Audio interpolation between two audio files

Year: 2022

Website: https://github.com/Harmonai-org/sample-generator

Input types: Audio Text

Output types: Audio

Output length: Variable

AI Technique: Latent Diffusion

Dataset: Online sources - glitch.cool, songaday.world, MAESTRO dataset, Unlocked Recordings, xeno-canto.org

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Loudly Generator

Loudly Generator lets you select genre and other options to generate music or use a text prompt. The new beta version provides an option of including your own audio clips.

Year: 2023

Website: https://www.loudly.com/

Input types: Text Metadata

Output types: Audio

Output length: 7 min

AI Technique: Not Specified

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #proprietary #API

Mubert

Platform for generating tracks, loops, mixes and jingles of length between 5 seconds to 5 minutes. The desired output can be prompted by a text or conditioned on an image (uploaded or linked to), or on additional information like genre, mood or activities. These labels can be chosen from a long list of predefined options.

Year: 2016

Website: https://mubert.com/

Input types: Text Genre Metadata Image

Output types: Audio

Output length: 5s-5min

AI Technique: Not Specified

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #proprietary #image-to-audio #API

MusicGen

Language Model for conditional music generation developed by Meta. The output can be prompted by a text description and additionally conditioned on a melody.

Year: 2023

Website: https://ai.honu.io/papers/musicgen/

Input types: Text

Output types: Audio

Output length: 30s

AI Technique: Transformer

Dataset: NSynth Dataset; Others not disclosed

License type: MIT/CC-BY-NC

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

MusicLM

Music generation from text descriptions developed by Google. The output can be prompted by a text description and additionally conditioned on a melody.

Year: 2023

Website: https://google-research.github.io/seanet/musiclm/examples/

Input types: Text

Output types: Audio

Output length: 30s

AI Technique: Transformer

Dataset: MusicCaps, AudioSet

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #proprietary

Okio Nendo

Open source platform capable of generating music from text prompts.

Year: 2023

Website: https://okio.ai/

Input types: Audio Text

Output types: Audio

Output length: Variable

AI Technique: Suite of AI tools

Dataset: Not disclosed

License type: MIT for core tools

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free

Riffusion

Music generation from text descriptions based on stable diffusion. Can be conditioned on an image.

Year: 2022

Website: https://www.riffusion.com/

Input types: Text Image

Output types: Audio

Output length: around 3min

AI Technique: Diffusion

Dataset: Not disclosed

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Soundful

An online platform for music generation based on a predefined genre or style. Users can select a format suitable for a specific type of content (eg. social media, gaming, vlogs), or type of output (eg. loops, sfx).

Year: 2023

Website: https://soundful.com/

Input types: Text Metadata

Output types: Audio MIDI

Output length: 2.5min

AI Technique: Not Specified

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #MIDI #text-prompt #free #proprietary

Splash Pro

Splash Pro is a platform for music generation from text descriptions. You can specify desired BPM and key. The platform contains a text-to-vocals model for synthesising realistic vocals. The output cn be downloaded in an MP3 format. From Splash's website: "We have been developing our own proprietary technology and high quality audio datasets since 2017. Our AI research and capabilities include Text-to-Singing, Text-to-Rap, Generative Text-to-Music, Composition, Melody, Voice Transfer, Lyrics and Mastering."

Year: 2023

Website: https://www.splashmusic.com/

Input types: Text

Output types: Audio

Output length: 30s-3min

AI Technique: Not Specified

Dataset: Data collected and owned by Splash as well as data freely available under Creative Commons license

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #proprietary

Stable Audio Open

Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.

Year: 2023

Website: https://stability.ai/news/introducing-stable-audio-open

Input types: Audio Text

Output types: Audio

Output length: 47s

AI Technique: Diffusion

Dataset: freesound.org, freemusicarchive.org

License type:

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Suno

Full song generation from a text description. Can generate songs with lyrics. The generated songs can be then remixed or extended.

Year: 2023

Website: https://suno.com

Input types: Text

Output types: Audio

Output length: around 2-4min

AI Technique: Not Specified

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #proprietary

Udio

Full song generation using text prompts. The songs can contain lyrics, prompted separately. The model can only be accessed through a proprietary platform which also offers clip and lyrics editing tools.

Year: 2024

Website: https://www.udio.com/

Input types: Audio Text

Output types: Audio

Output length: 30s

AI Technique: Not Specified

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #proprietary

Twoshot coproducer

TwoShot's Coproducer is an all-in-one AI assistant that helps creators produce high-quality, commercially-safe audio. The platform enables users to: * Generate full tracks from hummed melodies or simple text prompts. * Remix existing songs, split audio into stems, and create unique samples. * Automatically score video scenes with context-aware sound effects. Designed for both beginners and professionals, The coproducer integrates seamlessly with industry-standard DAWs (e.g., Ableton, Logic) and is built on a 100% ethically-sourced, rights-cleared foundation.

Year: 2025

Website: https://twoshot.ai/coproducer

Input types: Audio MIDI Text Genre Metadata

Output types: Audio MIDI

Output length:

AI Technique: Suite of AI tools

Dataset: Proprietary licenced dataset

License type: Depends on inputs/models. Possible to generate royalty free content in many ways

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #MIDI #text-prompt #free #proprietary

Mustango

Mustango is an open-source Text-to-Music model with focus on fine controllability allowing to specify musical attributes such as key or chord sequences.

Year: 2023

Website: https://amaai-lab.github.io/mustango/

Input types: Text

Output types: Audio

Output length: 10 sec

AI Technique: Latent Diffusion

Dataset: MusicBench

License type: MIT/CC-BY-SA

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #small-dataset #open-source #free #checkpoints

Yue AI

Yue AI is an open-source music generation model. The user can input lyrics and genre information as text, along with optional audio clips for context, with the input audio clips needing to be around 30 seconds long. Supported input languages include English, Mandarin Chinese, Cantonese, Japanese, and Korean. Yue AI will output its generated music in an audio file that is up to 5 minutes in length.

Year: 2025

Website: https://map-yue.github.io/

Input types: Audio Text

Output types: Audio

Output length: 5 minutes

AI Technique: Transformer

Dataset: WeNetSpeech, LibriHeavy, GigaSpeech, 650K hours of internet mined data

License type: Apache-2.0 license

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

DiffRhythm

DiffRhythm is an open-source music generation model. The user can input lyrics and style information as text, along with an optional audio prompt for context, with the input audio clips needing to be less than 10 seconds long. Supported languages include English and Chinese. DiffRhythm will output its generated music as an audio file (MP3, wav, or ogg) that is up to 285 seconds in length.

Year: 2025

Website: https://aslp-lab.github.io/DiffRhythm.github.io/

Input types: Audio Text

Output types: Audio

Output length: 95-285 seconds

AI Technique: Latent Diffusion

Dataset: 300,000 hours of internet scraped music, cleaned to 25,000 hours of top-quality music

License type: Apache-2.0 license

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Muzic ROC

Muzic ROC is an open-source music generation model. The user can input lyrics and a chord progression as text, and the model will output a melody. ROC is language-insensitive, so slightly editing the code can modify its output language.

Year: 2022

Website: https://github.com/microsoft/muzic/tree/main/roc

Input types: Text

Output types: MIDI

Output length: dependent on lyrics provided

AI Technique:

Dataset: LMD-matched MIDI dataset

License type: MIT License

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

SongGen

SongGen is an open-source music generation model. The user can input lyrics and description information as text, along with an optional audio clip of voice for cloning, with the input audio clip needing to be around 3 seconds. SongGen will output its generated music in an audio file that is up to 30 seconds in length.

Year: 2025

Website: https://liuzh-19.github.io/SongGen/

Input types: Audio Text

Output types: Audio

Output length: 30 seconds

AI Technique: Transformer

Dataset: 8,000 hours of audio from Million Song Dataset (MSD), Free Music Archive (FMA), and MTG-Jamendo Dataset

License type: Apache-2.0 license

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #free #checkpoints