DiffRhythm

DiffRhythm is an open-source music generation model. The user can input lyrics and style information as text, along with an optional audio prompt for context, with the input audio clips needing to be less than 10 seconds long. Supported languages include English and Chinese. DiffRhythm will output its generated music as an audio file (MP3, wav, or ogg) that is up to 285 seconds in length.

Year: 2025

Website: https://aslp-lab.github.io/DiffRhythm.github.io/

Input types: Audio Text

Output types: Audio

Output length: 95-285 seconds

AI Technique: Latent Diffusion

Dataset: 300,000 hours of internet scraped music, cleaned to 25,000 hours of top-quality music

License type: Apache-2.0 license

Real time:

Free:

Open source:

Checkpoints:

Fine-tune:

Train from scratch:

#text-to-audio #text-prompt #open-source #free #checkpoints

Guide to using the model

A demo is available here. To run it locally on MacOS, Windows, or Linux, the instructions are given in the GitHub Repository. Note that DiffRhythm requires at least 8 GB of VRAM.

Edit