Name:
Description: DiffRhythm is an open-source music generation model. The user can input lyrics and style information as text, along with an optional audio prompt for context, with the input audio clips needing to be less than 10 seconds long. Supported languages include English and Chinese. DiffRhythm will output its generated music as an audio file (MP3, wav, or ogg) that is up to 285 seconds in length.
Year:
Website:
Input types: Audio MIDI Text None Genre Metadata Image
Output types: Audio MIDI
Output length:
Technology: Not Specified Latent Consistency Model Latent Diffusion LSTM VAE Sequence-to-sequence neural network Transformer Suite of AI tools Diffusion Hierarchical Recurrent Neural Network (RNN) Autoregressive Convolutional Neural Network
Dataset:
License type:
Has real time inference: Yes No Not known
Is free: Yes No Yes and No, depending on the plan Not known
Is open source: Yes No Not known
Are checkpoints available: Yes No Not known
Can finetune: Yes No Not known
Can train from scratch: Yes No Not known
Tags: text-to-audio MIDI text-prompt small-dataset open-source low-resource free checkpoints proprietary no-input image-to-audio API
Guide: A demo is available [here](https://huggingface.co/spaces/ASLP-lab/DiffRhythm). To run it locally on MacOS, Windows, or Linux, the instructions are given in the [GitHub Repository](https://github.com/ASLP-lab/DiffRhythm). Note that DiffRhythm requires at least 8 GB of VRAM. This field renders Markdown
Captcha: