AIVA is an AI music generation assistant that allows you to generate new songs in more than 250 different styles, in a matter of seconds. Whether a complete beginner or a seasoned professional in music making, use the power of generative AI to create your own songs.
Year: 2019
Website: https://www.aiva.ai/
Input types: Audio MIDI
Output types: Audio MIDI
Output length: 5:30
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
AI-powered songwriting assistant as a mobile app. Enables generating an entire songs sketch or just a section with a single tap. The music can be downloaded or exported as MIDI for easy loading in a DAW.
Year: 2019
Website: https://amadeuscode.com/
Input types: Genre
Output types: Audio MIDI
Output length: Variable
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Efficient and high-quality text-to-audio generation with Latent Consistency Model.
Year: 2023
Website: https://audiolcm.github.io/
Input types: Text
Output types: Audio
Output length: Variable
AI Technique: Latent Diffusion
Dataset: "Teacher" model not disclosed, AudioCaps dataset, (Kim et al., 2019) for AudioLCM mode
License type: MIT
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
A web portal allowing users to create original songs, sharing and monetizing them with full commercial rights.
Year: 2019
Website: https://boomy.com/
Input types: Audio Text Genre
Output types: Audio
Output length: 1-2 mins / track
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
The first in a suite of generative audio tools for producers and musicians to be released by Harmonai. The provided Jupyter notebooks allow users to perform: - Unconditional random audio sample generation - Audio sample regeneration/style transfer using a single audio file or recording - Audio interpolation between two audio files
Year: 2022
Website: https://github.com/Harmonai-org/sample-generator
Input types: Audio Text
Output types: Audio
Output length: Variable
AI Technique: Latent Diffusion
Dataset: Online sources - glitch.cool, songaday.world, MAESTRO dataset, Unlocked Recordings, xeno-canto.org
License type: MIT
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Rhythm generator using Variational Autoencoder (VAE). Based on M4L.RhythmVAE by Nao Tokui, modded and extended to support simple and compound meter rhythms, with minimal amount of training data. Similarly to RhythmVAE, the goal of R-VAE is the exploration of latent spaces of musical rhythms. Unlike most previous work in rhythm modeling, R-VAE can be trained with small datasets, enabling rapid customization and exploration by individual users. R-VAE employs a data representation that encodes simple and compound meter rhythms. Models and latent space visualizations for R-VAE are available on the project's GitHub page: https://github.com/vigliensoni/R-VAE-models.
Year: 2022
Website: https://github.com/vigliensoni/R-VAE
Input types: MIDI
Output types: MIDI
Output length: 2 bars
AI Technique: VAE
Dataset: "The Future Sample Pack"
License type: GPLv3
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
A spectral approach to audio analysis and generation with neural networks (LSTM). The techniques included here were used as part of the Mezzanine Vs. MAGNet project featured as part of the Barbican's AI: More than Human exhibition It represents ongoing work from researchers at The Creative Computing Institute, UAL and Goldsmiths, University of London. MAGNet trains on the magnitude spectra of acoustic audio signals, and reproduces entirely new magnitude spectra that can be turned back in to sound using phase reconstruction - it's very high quality in terms of audio fidelity. This repo provides a chance for people to train their own models with their own source audio and genreate new sounds. Both given projects are designed to be simple to understand and easy to run.
Year: 2019
Website: https://github.com/Louismac/MAGNet
Input types: Audio
Output types: Audio
Output length: Input length
AI Technique: LSTM
Dataset: N/A
License type: BSD 3-Clause
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
RAVE is an audio processing/generativity based on deep learning. RAVE (Realtime Audio Variational autoEncoder) is a learning framework for generating a neural network model from audio data. RAVE allowing both fast and high-quality audio waveform synthesis (20x real-time at 48 kHz sampling rate on standard CPU). In Max and Pd, it is accompanied by its nn~ decoder, which enables these models to be used in real time for various applications, audio generativity/timbre transformation/transfer.
Year: 2022
Website: https://forum.ircam.fr/collections/detail/rave/
Input types: Audio
Output types: Audio
Output length: Variable / Audio buffer size
AI Technique: VAE
Dataset: N/A
License type: MIT
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Generates MIDI notes that are likely to follow the input drum beat or melody. Can extend the input of a specified MIDI clip by up to 32 measures. This can be helpful for adding variation to a drum beat or creating new material for a melodic track. It typically picks up on things like durations, key signatures and timing. It can be used to produce more random outputs by increasing the temperature. Ready to use as a Max for Live device. If you want to train the model on your own data or try different pre-trained models provided by the Magenta team, refer to the instructions on the team's GitHub page: https://github.com/magenta/magenta/tree/main/magenta/models/melody_rnn
Year: 2018
Website: https://magenta.tensorflow.org/studio#continue
Input types: MIDI
Output types: MIDI
Output length: 32 bars
AI Technique: LSTM
Dataset: MelodyRNN - Not disclosed; PerformanceRNN - The Piano- e-Competition dataset
License type: Apache 2.0
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Creates grooves based on the rhythm of any input. Can be used to generate a drum accompaniment to a bassline or melody, or to create a drum track from a tapped rhythm. Drumify works best with performed inputs, but it can also handle quantized clips. Ready to use as a Max for Live device. If you want to train the model on your own data or try different pre-trained models provided by the Magenta team, refer to the instructions on the team's GitHub page: https://github.com/magenta/magenta/tree/main/magenta/models/drums_rnn
Year: 2018
Website: https://magenta.tensorflow.org/studio#drumify
Input types: MIDI
Output types: MIDI
Output length: Input length
AI Technique: LSTM
Dataset: Expanded Groove MIDI dataset; Groove MIDI dataset
License type: Apache 2.0
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Generates a 4 bar phrase with no input necessary. It is possible to control the number of variations and temperature. The model can be helpful for breaking a creative block or as a source of inspiration for an original sample. Under the hood it uses MusicVAE. You can learn more about it here: https://magenta.tensorflow.org/music-vae. Ready to use as a Max for Live device. If you want to train the model on your own data or try different pre-trained models provided by the Magenta team, refer to the instructions on the team's GitHub page: https://github.com/magenta/magenta/tree/main/magenta/models/music_vae
Year: 2018
Website: https://magenta.tensorflow.org/studio#generate
Input types: None
Output types: MIDI
Output length: 4 bars
AI Technique: VAE
Dataset: "Millions of melodies and rhythms", including NSynth Dataset, MAESTRO dataset, Lakh MIDI dataset
License type: Apache 2.0
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Platform for generating tracks, loops, mixes and jingles of length between 5 seconds to 5 minutes. The desired output can be prompted by a text or conditioned on an image (uploaded or linked to), or on additional information like genre, mood or activities. These labels can be chosen from a long list of predefined options.
Year: 2016
Website: https://mubert.com/
Input types: Text Genre Metadata Image
Output types: Audio
Output length: 5s-5min
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Language Model for conditional music generation developed by Meta. The output can be prompted by a text description and additionally conditioned on a melody.
Year: 2023
Website: https://ai.honu.io/papers/musicgen/
Input types: Text
Output types: Audio
Output length: 30s
AI Technique: Transformer
Dataset: NSynth Dataset; Others not disclosed
License type: MIT/CC-BY-NC
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Music generation from text descriptions developed by Google. The output can be prompted by a text description and additionally conditioned on a melody.
Year: 2023
Website: https://google-research.github.io/seanet/musiclm/examples/
Input types: Text
Output types: Audio
Output length: 30s
AI Technique: Transformer
Dataset: MusicCaps, AudioSet
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Open source platform capable of generating music from text prompts.
Year: 2023
Website: https://okio.ai/
Input types: Audio Text
Output types: Audio
Output length: Variable
AI Technique: Suite of AI tools
Dataset: Not disclosed
License type: MIT for core tools
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Music generation from text descriptions based on stable diffusion. Can be conditioned on an image.
Year: 2022
Website: https://www.riffusion.com/
Input types: Text Image
Output types: Audio
Output length: around 3min
AI Technique: Diffusion
Dataset: Not disclosed
License type: MIT
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
WIP
Year: 2016
Website: https://github.com/soroushmehr/sampleRNN_ICLR2017
Input types: Audio
Output types: Audio
Output length: Variable
AI Technique: Hierarchical Recurrent Neural Network (RNN)
Dataset: Not disclosed
License type: MIT
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
An online platform for music generation based on a predefined genre or style. Users can select a format suitable for a specific type of content (eg. social media, gaming, vlogs), or type of output (eg. loops, sfx).
Year: 2023
Website: https://soundful.com/
Input types: Text Metadata
Output types: Audio MIDI
Output length: 2.5min
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Music generation based on selected metadata on a proprietary web platform. Users can specify the length (from 10s to 5 min), tempo (slow, normal, fast) and select genre and/or mood, as well as a theme (corporate, ads, cinematic, etc.).
Year: 2024
Website: https://soundraw.io/
Input types: Genre Metadata
Output types: Audio
Output length: 10s-5min
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Splash Pro is a platform for music generation from text descriptions. You can specify desired BPM and key. The platform contains a text-to-vocals model for synthesising realistic vocals. The output cn be downloaded in an MP3 format. From Splash's website: "We have been developing our own proprietary technology and high quality audio datasets since 2017. Our AI research and capabilities include Text-to-Singing, Text-to-Rap, Generative Text-to-Music, Composition, Melody, Voice Transfer, Lyrics and Mastering."
Year: 2023
Website: https://www.splashmusic.com/
Input types: Text
Output types: Audio
Output length: 30s-3min
AI Technique: Not Specified
Dataset: Data collected and owned by Splash as well as data freely available under Creative Commons license
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.
Year: 2023
Website: https://stability.ai/news/introducing-stable-audio-open
Input types: Audio Text
Output types: Audio
Output length: 47s
AI Technique: Diffusion
Dataset: freesound.org, freemusicarchive.org
License type:
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Full song generation from a text description. Can generate songs with lyrics. The generated songs can be then remixed or extended.
Year: 2023
Website: https://suno.com
Input types: Text
Output types: Audio
Output length: around 2-4min
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Full song generation using text prompts. The songs can contain lyrics, prompted separately. The model can only be accessed through a proprietary platform which also offers clip and lyrics editing tools.
Year: 2024
Website: https://www.udio.com/
Input types: Audio Text
Output types: Audio
Output length: 30s
AI Technique: Not Specified
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
A model for generating speech and other audio signals like music developed by Google DeepMind. It uses raw waveforms which is more computationally expensive but produces more natural results. It was one of the early models using a CNN network to generate coherent musical structures.
Year: 2016
Website: https://deepmind.google/discover/blog/wavenet-a-generative-model-for-raw-audio/
Input types: Audio
Output types: Audio
Output length: Variable
AI Technique: Autoregressive Convolutional Neural Network
Dataset: Not disclosed
License type: Proprietary
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Encode and decode audio samples to/from compressed representations! Useful for efficient generative modeling applications and for other downstream tasks.
Year: 2024
Website: https://github.com/SonyCSLParis/music2latent
Input types: Audio
Output types: Audio
Output length: Variable / Audio input length
AI Technique: Latent Consistency Model
Dataset: MTG Jamendo and DNS Challenge 4
License type: CC-BY-NC
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
Music FaderNets is a controllable MIDI generation framework that models high-level musical qualities, such as emotional attributes like arousal. Drawing inspiration from the concept of sliding faders on a mixing console, the model offers intuitive and continuous control over these characteristics. Given an input MIDI, Music FaderNets can produce multiple variations with different levels of arousal, adjusted according to the position of the fader.
Year: 2020
Website: https://music-fadernets.github.io/
Input types: MIDI
Output types: MIDI
Output length:
AI Technique: VAE
Dataset: VGMIDI, Yamaha Piano-e-Competition
License type: MIT
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
TwoShot's Coproducer is an all-in-one AI assistant that helps creators produce high-quality, commercially-safe audio. The platform enables users to: * Generate full tracks from hummed melodies or simple text prompts. * Remix existing songs, split audio into stems, and create unique samples. * Automatically score video scenes with context-aware sound effects. Designed for both beginners and professionals, The coproducer integrates seamlessly with industry-standard DAWs (e.g., Ableton, Logic) and is built on a 100% ethically-sourced, rights-cleared foundation.
Year: 2025
Website: https://twoshot.ai/coproducer
Input types: Audio MIDI Text Genre Metadata
Output types: Audio MIDI
Output length:
AI Technique: Suite of AI tools
Dataset: Proprietary licenced dataset
License type: Depends on inputs/models. Possible to generate royalty free content in many ways
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch: