AI Music Generation - Model Explorer

Add model

AudioLCM

Efficient and high-quality text-to-audio generation with Latent Consistency Model.

Year: 2023

Website: https://audiolcm.github.io/

Input types: Text

Output types: Audio

Output length: Variable

AI Technique: Latent Diffusion

Dataset: "Teacher" model not disclosed, AudioCaps dataset, (Kim et al., 2019) for AudioLCM mode

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Dance Diffusion

The first in a suite of generative audio tools for producers and musicians to be released by Harmonai. The provided Jupyter notebooks allow users to perform: - Unconditional random audio sample generation - Audio sample regeneration/style transfer using a single audio file or recording - Audio interpolation between two audio files

Year: 2022

Website: https://github.com/Harmonai-org/sample-generator

Input types: Audio Text

Output types: Audio

Output length: Variable

AI Technique: Latent Diffusion

Dataset: Online sources - glitch.cool, songaday.world, MAESTRO dataset, Unlocked Recordings, xeno-canto.org

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

R-VAE

Rhythm generator using Variational Autoencoder (VAE). Based on M4L.RhythmVAE by Nao Tokui, modded and extended to support simple and compound meter rhythms, with minimal amount of training data. Similarly to RhythmVAE, the goal of R-VAE is the exploration of latent spaces of musical rhythms. Unlike most previous work in rhythm modeling, R-VAE can be trained with small datasets, enabling rapid customization and exploration by individual users. R-VAE employs a data representation that encodes simple and compound meter rhythms. Models and latent space visualizations for R-VAE are available on the project's GitHub page: https://github.com/vigliensoni/R-VAE-models.

Year: 2022

Website: https://github.com/vigliensoni/R-VAE

Input types: MIDI

Output types: MIDI

Output length: 2 bars

AI Technique: VAE

Dataset: "The Future Sample Pack"

License type: GPLv3

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#MIDI #small-dataset #open-source #low-resource #free #checkpoints

MAGNet

A spectral approach to audio analysis and generation with neural networks (LSTM). The techniques included here were used as part of the Mezzanine Vs. MAGNet project featured as part of the Barbican's AI: More than Human exhibition It represents ongoing work from researchers at The Creative Computing Institute, UAL and Goldsmiths, University of London. MAGNet trains on the magnitude spectra of acoustic audio signals, and reproduces entirely new magnitude spectra that can be turned back in to sound using phase reconstruction - it's very high quality in terms of audio fidelity. This repo provides a chance for people to train their own models with their own source audio and genreate new sounds. Both given projects are designed to be simple to understand and easy to run.

Year: 2019

Website: https://github.com/Louismac/MAGNet

Input types: Audio

Output types: Audio

Output length: Input length

AI Technique: LSTM

Dataset: N/A

License type: BSD 3-Clause

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#small-dataset #open-source #low-resource #free

RAVE

RAVE is an audio processing/generativity based on deep learning. RAVE (Realtime Audio Variational autoEncoder) is a learning framework for generating a neural network model from audio data. RAVE allowing both fast and high-quality audio waveform synthesis (20x real-time at 48 kHz sampling rate on standard CPU). In Max and Pd, it is accompanied by its nn~ decoder, which enables these models to be used in real time for various applications, audio generativity/timbre transformation/transfer.

Year: 2022

Website: https://forum.ircam.fr/collections/detail/rave/

Input types: Audio

Output types: Audio

Output length: Variable / Audio buffer size

AI Technique: VAE

Dataset: N/A

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#small-dataset #open-source #free #checkpoints

Magenta Continue

Generates MIDI notes that are likely to follow the input drum beat or melody. Can extend the input of a specified MIDI clip by up to 32 measures. This can be helpful for adding variation to a drum beat or creating new material for a melodic track. It typically picks up on things like durations, key signatures and timing. It can be used to produce more random outputs by increasing the temperature. Ready to use as a Max for Live device. If you want to train the model on your own data or try different pre-trained models provided by the Magenta team, refer to the instructions on the team's GitHub page: https://github.com/magenta/magenta/tree/main/magenta/models/melody_rnn

Year: 2018

Website: https://magenta.tensorflow.org/studio#continue

Input types: MIDI

Output types: MIDI

Output length: 32 bars

AI Technique: LSTM

Dataset: MelodyRNN - Not disclosed; PerformanceRNN - The Piano- e-Competition dataset

License type: Apache 2.0

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#MIDI #open-source #free #checkpoints

Magenta Drumify

Creates grooves based on the rhythm of any input. Can be used to generate a drum accompaniment to a bassline or melody, or to create a drum track from a tapped rhythm. Drumify works best with performed inputs, but it can also handle quantized clips. Ready to use as a Max for Live device. If you want to train the model on your own data or try different pre-trained models provided by the Magenta team, refer to the instructions on the team's GitHub page: https://github.com/magenta/magenta/tree/main/magenta/models/drums_rnn

Year: 2018

Website: https://magenta.tensorflow.org/studio#drumify

Input types: MIDI

Output types: MIDI

Output length: Input length

AI Technique: LSTM

Dataset: Expanded Groove MIDI dataset; Groove MIDI dataset

License type: Apache 2.0

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#MIDI #open-source #free #checkpoints

Magenta Generate

Generates a 4 bar phrase with no input necessary. It is possible to control the number of variations and temperature. The model can be helpful for breaking a creative block or as a source of inspiration for an original sample. Under the hood it uses MusicVAE. You can learn more about it here: https://magenta.tensorflow.org/music-vae. Ready to use as a Max for Live device. If you want to train the model on your own data or try different pre-trained models provided by the Magenta team, refer to the instructions on the team's GitHub page: https://github.com/magenta/magenta/tree/main/magenta/models/music_vae

Year: 2018

Website: https://magenta.tensorflow.org/studio#generate

Input types: None

Output types: MIDI

Output length: 4 bars

AI Technique: VAE

Dataset: "Millions of melodies and rhythms", including NSynth Dataset, MAESTRO dataset, Lakh MIDI dataset

License type: Apache 2.0

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#MIDI #open-source #free #checkpoints #no-input

MusicGen

Language Model for conditional music generation developed by Meta. The output can be prompted by a text description and additionally conditioned on a melody.

Year: 2023

Website: https://ai.honu.io/papers/musicgen/

Input types: Text

Output types: Audio

Output length: 30s

AI Technique: Transformer

Dataset: NSynth Dataset; Others not disclosed

License type: MIT/CC-BY-NC

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Okio Nendo

Open source platform capable of generating music from text prompts.

Year: 2023

Website: https://okio.ai/

Input types: Audio Text

Output types: Audio

Output length: Variable

AI Technique: Suite of AI tools

Dataset: Not disclosed

License type: MIT for core tools

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free

Riffusion

Music generation from text descriptions based on stable diffusion. Can be conditioned on an image.

Year: 2022

Website: https://www.riffusion.com/

Input types: Text Image

Output types: Audio

Output length: around 3min

AI Technique: Diffusion

Dataset: Not disclosed

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

SampleRNN

WIP

Year: 2016

Website: https://github.com/soroushmehr/sampleRNN_ICLR2017

Input types: Audio

Output types: Audio

Output length: Variable

AI Technique: Hierarchical Recurrent Neural Network (RNN)

Dataset: Not disclosed

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#small-dataset #open-source #free #checkpoints

Stable Audio Open

Open source text-to-audio model for generating samples and sound effects from text descriptions. The model enables audio variations and style transfer of audio samples. The creators claim it is ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. Generates stereo audio at 44.1kHz.

Year: 2023

Website: https://stability.ai/news/introducing-stable-audio-open

Input types: Audio Text

Output types: Audio

Output length: 47s

AI Technique: Diffusion

Dataset: freesound.org, freemusicarchive.org

License type:

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

WaveNet

A model for generating speech and other audio signals like music developed by Google DeepMind. It uses raw waveforms which is more computationally expensive but produces more natural results. It was one of the early models using a CNN network to generate coherent musical structures.

Year: 2016

Website: https://deepmind.google/discover/blog/wavenet-a-generative-model-for-raw-audio/

Input types: Audio

Output types: Audio

Output length: Variable

AI Technique: Autoregressive Convolutional Neural Network

Dataset: Not disclosed

License type: Proprietary

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#open-source #free #checkpoints

Music2Latent

Encode and decode audio samples to/from compressed representations! Useful for efficient generative modeling applications and for other downstream tasks.

Year: 2024

Website: https://github.com/SonyCSLParis/music2latent

Input types: Audio

Output types: Audio

Output length: Variable / Audio input length

AI Technique: Latent Consistency Model

Dataset: MTG Jamendo and DNS Challenge 4

License type: CC-BY-NC

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#open-source #free #checkpoints

Music FaderNets

Music FaderNets is a controllable MIDI generation framework that models high-level musical qualities, such as emotional attributes like arousal. Drawing inspiration from the concept of sliding faders on a mixing console, the model offers intuitive and continuous control over these characteristics. Given an input MIDI, Music FaderNets can produce multiple variations with different levels of arousal, adjusted according to the position of the fader.

Year: 2020

Website: https://music-fadernets.github.io/

Input types: MIDI

Output types: MIDI

Output length:

AI Technique: VAE

Dataset: VGMIDI, Yamaha Piano-e-Competition

License type: MIT

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#MIDI #open-source #free #checkpoints