Name:
Description: A model for generating speech and other audio signals like music developed by Google DeepMind. It uses raw waveforms which is more computationally expensive but produces more natural results. It was one of the early models using a CNN network to generate coherent musical structures.
Year:
Website:
Input types: Audio MIDI Text None Genre Metadata Image
Output types: Audio MIDI
Output length:
Technology: Not Specified Latent Consistency Model Latent Diffusion LSTM VAE Sequence-to-sequence neural network Transformer Suite of AI tools Diffusion Hierarchical Recurrent Neural Network (RNN) Autoregressive Convolutional Neural Network
Dataset:
License type:
Has real time inference: Yes No Not known
Is free: Yes No Yes and No, depending on the plan Not known
Is open source: Yes No Not known
Are checkpoints available: Yes No Not known
Can finetune: Yes No Not known
Can train from scratch: Yes No Not known
Tags: text-to-audio MIDI text-prompt small-dataset open-source low-resource free checkpoints proprietary no-input image-to-audio
Guide: This repository contains a TensorFlow implementation of DeepMind's WaveNet paper: [https://github.com/ibab/tensorflow-wavenet](https://github.com/ibab/tensorflow-wavenet). The repository's README page contains instructions on how to set the project up, train the network on a selected dataset and generate outputs. This field renders Markdown
Captcha: