❗Note: This repository is in-progress for the improvement, please create the issue or contact with us if are there any issues.
Welcome to the official implementation of Llambada version 0 repository! This project provides the tools and resources to use the Llambada model, an advanced system for music generation.
-
Project page: Llambada demo
This model is trained on totally 4.4k music hours dataset with 2xA100 GPUS. The training cost for this model is about 720 USD in 5 days for 2 stages: the semantic stage and the coarse stage.
⭐ Hopefully, we want open the a.i for everyone, so all of the source code of the model, the training script, and the hyperparameters will be released :)
Please note: At this time, the repository includes only the inference code and pre-trained model checkpoint. Training scripts will be added in a future update.
- Model code
- Inference script
- Checkpoint
- Update mix audio script for vocal and accompaniment
- Training script
- Gradio inference
- Model serving
Some of our demos can be found here, with the following input and output:
-
Input: Vocal + prompt
-
Output: Accompaniment
We then mix them together for the final song, which you can listen at the mixed results
Prompt: Music beat for movie with acoustic, female vocals, piano, guitar, bass
Vocal
vocal.mp4
Mixed Result
result.mp4
Prompt: Music beat with romantic, female vocals, piano, bass, love song, movie soundtrack
Vocal:
vocal2.mp4
Mixed Result
result2.mp4
Follow the steps below to set up your Python 3.10 environment using Conda and install the required dependencies.
Step 1: Create the environment
conda env create -f environment.yml
conda activate llambada
Step 2: Install dependencies Install ffmpeg (for ubuntu, the script is here) and the dependencies.
apt update && apt install ffmpeg
pip install -r requirements.txt
Instructions and scripts for training will be provided in a future release.
All of the checkpoints for semantic stage and the coarse stage can be downloaded in the HuggingFace of SongGen
After downloading the checkpoints, you need to create the ckpts/
folder, then you move all files to the ckpts/
folder.
Regarding the tokenizer bpe_simple_vocab_16e6.txt.gz
, you need to copy that file to the /workspace/llambada_test/LLambada/models/base/tokenizers/laion_clap/clap_module
for the setup.
Utilize the pre-trained Llambada model to generate music easily.
To run the inference, please run via the python file below:
python demo.py
Create stunning music compositions with Llambada effortlessly!
Moreover, you can change the gpu for the inference via add this config to the front CUDA_VISIBLE_DEVICES=<your device id>
Total inference time for 10 seconds singing accompaniment is about 1 minute and 30 seconds on 1xH100.
If you have any further questions or having new ideas for the model features, you can raise in the issue or you can contact us in songgen.ai and we can have support in our ability!
Thank you so much to MERT, Open-musiclm, Encodec, AudioLM-pytorch, CLAP for their published works, that can help us done this repo.
@article{trinh2024sing,
title={Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations},
author={Trinh, Quoc-Huy and Nguyen, Minh-Van and Mau, Trong-Hieu Nguyen and Tran, Khoa and Do, Thanh},
journal={arXiv preprint arXiv:2411.01661},
year={2024}
}
Copyright 2025 Songgen.ai
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.