WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning
Mentors: Kejun Zhang*, Tan Xu, Lingyun Sun
Authors: Xinda Wu*, Tieyao Zhang, Zhijie Huang, Liang Qihao, and Songruoyao Wu
∗ Equal contribution
WuYun (悟韵):Paper arXiv | Demo Page | ...
Official PyTorch implementation of preprint paper "WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning" (Updated, Version3, Add Chord Tones Analysis, 202402).
WuYun (悟韵), is a knowledge-enhanced deep learning architecture for improving the structure of generated melodies. Inspired by the hierarchical organization principle of structure and prolongation, we decompose the melody generation process into melodic skeleton construction and melody inpainting stages, which first generate the most structurally important notes to construct a melodic skeleton and subsequently infill it with dynamically decorative notes into a full-fledged melody. Specifically, we introduce a melodic skeleton extraction framework from rhythm and pitch dimensions based on music domain knowledge to help the sequence learning model hallucinate or predict a novel melodic skeleton. The reconstructed melodic skeletons serve as additional knowledge to provide auxiliary guidance for the melody generation process and are saved as the underlying framework of the final generated melody.
Clone this repository
cd /WuYun-Torch
- NVIDIA GPU + CUDA + CUDNN
- python 3.8.5
- Required packages:
- miditoolkit
- torch 2.0.1
- others...(install what your missing)
core code: ./preprocessing/mdp_wuyun.py
doc: ./preprocessing/README.md
Core functions:
- Select 4/4 ts ( requirement >= 8 bars )
- Track Classification (midi-miner): lead melody, chord, bass, drum, and others.
- MIDI Quantization (straight notes and triplets) (WuYun)
- Octave Transposition
- Filter midis by heuristic rules
- Deduplication (pitch interval)
Chord Recognition (Magenta)Tonality Unification (WuYun)- ...
Note: For the detailed melody data processing procedure, please refer to WuYun
and MelodyGLM
Extract the type of melody skeleton you need using class Melody_Skeleton_Extractor
in code dir ./preprocessing/utils/melodic_skeleton
Type
means the type of melodic skeleton (proportion of all the notes).
No. | Type | Ratio | Code |
---|---|---|---|
0 | Down Beat | ~39.79% | melodic_skeleton_analysis_rhythm.py |
1 | Long Note | ~22.13% | melodic_skeleton_analysis_rhythm.py |
2 | Rhythm | ~44.49% | melodic_skeleton_analysis_rhythm.py |
3 | Rhythm ∩ Chord Tones ∩ Tonal Tones | ~14.76% | melodic_skeleton.py |
4 | Rhythm ∩ Chord Tones | ~35.24% | melodic_skeleton.py |
5 | Rhythm ∩ Tonal Tones | ~17.6% | melodic_skeleton.py |
6 | Syncopation | ~8.7% | melodic_skeleton_analysis_rhythm.py |
7 | Tonal Tones | ~28.46% | melodic_skeleton_analysis_tonal_tones.py |
For the latest version of the popular music melodic skeleton extraction algorithm, please refer to the code.
1. build dictionary
# prepare your chord vocabulary (optional)
python3 dataset/statistic.py
# build your pre-defined vocabulary
python3 modules/build_dictionary.py
2. tokenization
python3 models/skeleton/dataloader.py
3. train skeleton generation model
# if you want to use other kind of melodic skeleton, just change the type number according to your datasets
# for example
python3 models/skeleton/main.py --type 4 --gpu_id 4 # 'Rhythm ∩ Chord'
4. inference melodic skeleton from scratch
Note: Objective metrics don't directly reflect subjective results, so try a few more model checkpoint after the model converges.
# for example
python3 models/skeleton/inference.py --type 4 --gpu_id 2 --ckpt_fn 'ckpt_epoch_400.pth.tar' --epoch 400
1. tokenization
python3 models/prolongation/dataloader.py
2. train melodic prolongation model
# for example
python3 models/prolongation/main.py --type 4 --gpu_id 8 # 'Rhythm ∩ Chord'
3. inference from real melodic skeletons(基于人类音乐的旋律骨架完成装饰)
# for example
python3 models/prolongation/inference_real.py --type 4 --gpu_id 0 --ckpt_fn 'ckpt_epoch_25.pt' --epoch '25'
4. inference from generated melodic skeletons (基于AI生成的旋律骨架完成装饰)
# for example
python3 models/prolongation/inference_scratch.py --type 4 --gpu_id 0 --ckpt_fn 'ckpt_epoch_25.pt' --pro_epoch '25' --ske_epoch '400'
Evaluation Metrics list:
- OA(PCH)
- OA(IOI)
- SE
code dir: './evaluation'
you can write chord and bass tracks if the task is melody geration with chord progression.
python3 utils/add_chord_bass_track.py
@article{zhang2023wuyun,
title={WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning},
author={Zhang, Kejun and Wu, Xinda and Zhang, Tieyao and Huang, Zhijie and Tan, Xu and Liang, Qihao and Wu, Songruoyao and Sun, Lingyun},
journal={arXiv preprint arXiv:2301.04488},
year={2023}
}
@article{wu2023melodyglm,
title={MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation},
author={Wu, Xinda and Huang, Zhijie and Zhang, Kejun and Yu, Jiaxing and Tan, Xu and Zhang, Tieyao and Wang, Zihao and Sun, Lingyun},
journal={arXiv preprint arXiv:2309.10738},
year={2023}
}
We appreciate to the following authors who make their code available or provide technical support:
- Music Transformer: https://github.com/gwinndr/MusicTransformer-Pytorch
- Compound Word Transformer: https://github.com/YatingMusic/compound-word-transformer
- Melons: Yi Zou.