DiffRhythm is an open-source music generation model. The user can input lyrics and style information as text, along with an optional audio prompt for context, with the input audio clips needing to be less than 10 seconds long. Supported languages include English and Chinese. DiffRhythm will output its generated music as an audio file (MP3, wav, or ogg) that is up to 285 seconds in length.
Year: 2025
Website: https://aslp-lab.github.io/DiffRhythm.github.io/
Input types: Audio Text
Output types: Audio
Output length: 95-285 seconds
AI Technique: Latent Diffusion
Dataset: 300,000 hours of internet scraped music, cleaned to 25,000 hours of top-quality music
License type: Apache-2.0 license
Real time:
Free:
Open source:
Checkpoints:
Fine-tune:
Train from scratch:
A demo is available here. To run it locally on MacOS, Windows, or Linux, the instructions are given in the GitHub Repository. Note that DiffRhythm requires at least 8 GB of VRAM.