Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review #1

Merged
merged 68 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
fa276ca
Initial commit
maks00170 Oct 9, 2023
b5a193f
init_commit
maks00170 Oct 9, 2023
3ceef8d
clean pl.py
maks00170 Oct 9, 2023
754051f
init stream demo
d-a-yakovlev Oct 9, 2023
30b21c5
streaming: demo structure
d-a-yakovlev Oct 9, 2023
f3ce9a6
streaming: IPython commenting
d-a-yakovlev Oct 9, 2023
b65e32e
update
maks00170 Oct 27, 2023
10c2578
Update README.md
maks00170 Oct 27, 2023
4e50daf
add req
maks00170 Oct 27, 2023
1bb5043
merge
maks00170 Oct 27, 2023
9127466
del pl
maks00170 Oct 27, 2023
9a95a9c
Update README.md
maks00170 Oct 27, 2023
c3b6044
Update README.md
maks00170 Oct 27, 2023
f266701
up readme
maks00170 Oct 27, 2023
2a1191c
refactor dataset.py - flake8 and del distrib dep
d-a-yakovlev Nov 8, 2023
7f5bd64
refactor dataset.py - print -> logging
d-a-yakovlev Nov 8, 2023
c63da0e
update code: 1)add comments 2)black, 3)refactor code 4)test notebook
maks00170 Nov 9, 2023
fcc35f4
update Readme
maks00170 Nov 9, 2023
be68670
relative path -> absolute, some annotations [NTC]
d-a-yakovlev Nov 10, 2023
c21557d
separator: add config
d-a-yakovlev Nov 14, 2023
5e01862
separator: update config
d-a-yakovlev Nov 14, 2023
3d096e9
separator: eval script, update config, gdown to requirements
d-a-yakovlev Nov 14, 2023
bc57088
separator: one download default sample
d-a-yakovlev Nov 14, 2023
97f6f10
separator: eval fix
d-a-yakovlev Nov 14, 2023
6b64282
streaming: raw converter script
d-a-yakovlev Nov 16, 2023
510fff9
general: update README
d-a-yakovlev Nov 16, 2023
a6a9506
general: update README
d-a-yakovlev Nov 16, 2023
67cae64
streaming: add stream class
d-a-yakovlev Nov 16, 2023
cda2178
separator: config,pl_model,readme update & delete test
maks00170 Nov 16, 2023
60ab5a2
general: init dockerfile
d-a-yakovlev Nov 16, 2023
88822e2
general: add docker section, docker run script, fix configs paths
d-a-yakovlev Nov 22, 2023
42f850c
general: add colab link stand, fix run_docker
d-a-yakovlev Nov 22, 2023
a659d93
general: minor fix in Dockerfile
d-a-yakovlev Nov 23, 2023
87d6ccd
separator: eval->inference, del distrib, config
maks00170 Nov 27, 2023
87dae59
separator: black
maks00170 Nov 27, 2023
97578fb
general: docker tune
d-a-yakovlev Nov 28, 2023
ddb459b
init_weights
maks00170 Dec 17, 2023
4cc3311
general: minor change with {} works in Win
d-a-yakovlev Dec 17, 2023
95e8ad1
separator: cleaning and add some attributes in model
d-a-yakovlev Dec 17, 2023
78f396f
streaming: deprecated clean
d-a-yakovlev Dec 17, 2023
06057a8
streaming: converter script
d-a-yakovlev Dec 17, 2023
e08c9b1
general: slightly updated readme
d-a-yakovlev Dec 17, 2023
163df3e
general: add CI
d-a-yakovlev Dec 18, 2023
fe8d679
general: configuring CI 1
d-a-yakovlev Dec 18, 2023
c063333
general: configuring CI 2
d-a-yakovlev Dec 18, 2023
2344f3d
streaming: pushing all sources
d-a-yakovlev Dec 18, 2023
d18a30a
streaming: take out deprecated 2
d-a-yakovlev Dec 18, 2023
234c634
general: fix typos in readme
d-a-yakovlev Dec 18, 2023
1639898
general: black apply
d-a-yakovlev Dec 18, 2023
98df7c2
general: configuring CI 2
d-a-yakovlev Dec 18, 2023
6b87c1d
general: configuring CI 3
d-a-yakovlev Dec 18, 2023
8a328c6
general: configuring CI 4
d-a-yakovlev Dec 18, 2023
8a8e6d5
streaming: take out trash 3
d-a-yakovlev Dec 18, 2023
062f28b
Merge branch 'main' into project
Lallapallooza Dec 18, 2023
def592a
refac: full - run_docker, almost - dataset, inference
d-a-yakovlev Dec 29, 2023
2f869eb
refac: full - runner, stream_class -> tf_lite_stream, almost - converter
d-a-yakovlev Dec 29, 2023
6cec29a
refac: runner
d-a-yakovlev Dec 29, 2023
1fdc0da
rafactor_separator: modules, PM_unet, pl_model,STFT
maks00170 Dec 29, 2023
f46b0e0
black and refac config
maks00170 Dec 29, 2023
d8ea2f8
one del space config
maks00170 Dec 29, 2023
d4c60de
refac stream config
maks00170 Dec 30, 2023
08fefd1
ref comment config
maks00170 Dec 30, 2023
ae7eeea
refac: docker debugged 1
d-a-yakovlev Jan 9, 2024
fd090eb
refac: configuring CI 5
d-a-yakovlev Jan 9, 2024
20d2466
refac: inference output 1
d-a-yakovlev Jan 15, 2024
0d7eb3b
refac: inference output 2
d-a-yakovlev Jan 15, 2024
50c93aa
refac: inference output 3
d-a-yakovlev Jan 16, 2024
26b6b2d
refac: fix comment
maks00170 Jan 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.github
**/__pycache__/
separator/inference/
streaming/weights/
streaming/input/
streaming/streams/
streaming/model/
streaming/tflite_model/

18 changes: 18 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: Main

on: [push, pull_request]

jobs:
main:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
with:
python-version: 3.10.12
cache: "pip"
- name: "installation"
run: |
pip install -r requirements-dev.txt
- name: "black"
run: black . --check --diff --color --exclude .*/config/
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.vscode
**/__pycache__/
separator/inference/
streaming/weights
streaming/input/
streaming/streams/
streaming/model/
streaming/tflite_model/
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM nvcr.io/nvidia/tensorrt:22.08-py3

ENV PYTHONUNBUFFERED=1

RUN apt-get -y update && apt-get -y upgrade
RUN apt-get install -y --no-install-recommends ffmpeg
RUN apt-get install -y python3-pip
RUN echo 'alias python=python3' >> ~/.bashrc
RUN echo 'NCCL_SOCKET_IFNAME=lo' >> ~/.bashrc


WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

ENTRYPOINT [ "bash" ]
73 changes: 73 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# PM-Unet: phase and magnitude aware model for music source separation
[![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://d-a-yakovlev.github.io/test/)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OXlCZgd5KidMDZDUItOIT9ZA4IUJHXsZ?usp=sharing)

## Navigation
1. [Structure](#structure)
2. [Docker](#docker)
3. [Training](#training)
4. [Inference](#inference)

## Structure
- [`separator`](./separator) ‒ main source code with model and dataset implementations and code to train model.
- [`streaming`](./streaming/demo) ‒ source code inference tf-lite version model.

## Docker
#### To set up environment with Docker

If you don't have Docker installed, please follow the links to find installation instructions for [Ubuntu](https://docs.docker.com/desktop/install/linux-install/), [Mac](https://docs.docker.com/desktop/install/mac-install/) or [Windows](https://docs.docker.com/desktop/install/windows-install/).

Build docker image:

docker build -t pmunet .

Run docker image:

bash run_docker.sh

## Data
Used dataset [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html#musdb18-hq-uncompressed-wav).

[![Download dataset](https://img.shields.io/badge/Download%20dataset-65c73b)](https://zenodo.org/record/3338373/files/musdb18hq.zip?download=1)

The dataset consists of
150 full-length stereo tracks sampled at 44.1 kHz. providing a
complete audio mix and four main elements: ”vocal”, ”bass”,
”drums” and ”other” for each sample, which can be considered as a target in the context of source separation. The kit
structure offers 100 training compositions and 50 validation
compositions

## Training
1. Configure arguments in `separator/config/config.py`.
2. `cd separator`.
3. Run `python3 separator/pl_model.py`.

## Inference

### Auto local
1. Configure arguments in `separator/config/config.py`.
2. `cd separator`.
3. `python3 inference.py [-IO]`
- `-I` specify path to mixture,
- `-O` output dir, both of them optional.

By default script loads `.pt` file with weights and `sample.wav` from google drive.

#### For example
```
python3 inference.py -I path/to/mix -O out_dir
```
With successful script run four audio files (`vocals.wav` and `drums.wav`, `bass.wav`, `other.wav`) will be in `out_dir`. By default in `separator/inference/output`.

**You can download weights manually**

Download one the .pt file below:
* [LSTM-bottleneck version](https://drive.google.com/file/d/18jT2TYffdRD1fL7wecAiM5nJPM_OKpNB/view?usp=drive_link)
* [WIthout LSTM-bottleneck version](https://drive.google.com/file/d/1VO07OYbsnCuEJYRSuA8HhjlQnx6dbWX7/view?usp=drive_link)

### Streaming
In streaming section located scripts for: convert model to `tflite` format and run `tflite` model in `"stream mode"`.

1. Configure arguments in `streaming/config/config.py`.
2. `cd streaming`.
3. `python3 runner.py`
3 changes: 3 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
black
mypy
pytest
19 changes: 19 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
ffmpeg-python==0.2.0
gdown==4.6.3
julius==0.2.7
lpips==0.1.4
musdb==0.4.0
nobuco
omegaconf==2.3.0
openunmix==1.2.1
soundfile==0.12.1
sox==1.4.1
stempeg==0.2.3
sympy==1.12
tensorflow>=2.13.0
torch==2.0.1
torch-audiomentations==0.11.0
torchaudio==2.0.2
torchmetrics==0.11.4
pytorch-lightning==2.0.3
tqdm==4.65.0
9 changes: 9 additions & 0 deletions run_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

app=$(pwd)

docker run --name pmunet -it --rm \
--net=host --ipc=host \
--gpus "all" \
-v ${app}:/app \
pmunet
101 changes: 101 additions & 0 deletions separator/config/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
from dataclasses import dataclass
from pathlib import Path
from typing import Union


from dataclasses import dataclass
from pathlib import Path
from typing import Union


@dataclass
class TrainConfig:

# DATA OPTIONS
musdb_path : str = "musdb18hq" # Directory path where the MUSDB18-HQ dataset is stored.
metadata_train_path : str = "metadata" # Directory path for saving training metadata, like track names and lengths.
metadata_test_path : str = "metadata1" # Directory path for saving testing metadata.
segment : int = 5 # Length (in seconds) of each audio segment used during training.

# MODEL OPTIONS
model_source : tuple = ("drums", "bass", "other", "vocals") # Sources to target in source separation.
model_depth : int = 4 # The depth of the U-Net architecture.
model_channel : int = 28 # Number of initial channels in U-Net layers.
is_mono : bool = False # Indicates whether the input audio should be treated as mono (True) or stereo (False).
mask_mode : bool = False # Whether to utilize masking within the model.
skip_mode : str = "concat" # Mode of skip connections in U-Net ('concat' for concatenation, 'add' for summation).
nfft : int = 4096 # Number of bins used in STFT.
bottlneck_lstm : bool = True # Determines whether to use LSTM layers as bottleneck in the U-Net architecture.
layers : int = 2 # Number of LSTM layers if bottleneck.
stft_flag : bool = True # A flag to decide whether to apply the STFT is required for tflite.

# TRAIN OPTIONS
device : str = "cuda" # The computing platform for training: 'cuda' for NVIDIA GPUs or 'cpu'.
batch_size : int = 6 # Batch size for training.
shuffle_train : bool = True # Whether to shuffle the training dataset.
shuffle_valid : bool = False # Whether to shuffle the valid dataset.
drop_last : bool = True # Whether to drop the last incomplete batch in train data.
num_workers : int = 2 # Number of worker processes used for loading data.
metric_monitor_mode : str = "min" # Strategy for monitoring metrics to save model checkpoints.
save_top_k_model_weights : int = 1 # Number of best-performing model weights to save based on the monitored metric.

factor : int = 1 # Factors for different components of the loss function.
c_factor : int = 1

loss_nfft : tuple = (4096,) # Number of FFT bins for calculating loss.
gamma : float = 0.3 # Gamma parameter for adjusting the focus of the loss on certain aspects of the audio spectrum.
lr : float = 0.5 * 3e-3 # Learning rate for the optimizer.
T_0 : int = 40 # Period of the cosine annealing schedule in learning rate adjustment.
max_epochs : int = 100 # Maximum number of training epochs.
precision : str = 16 # Precision of training computations.
grad_clip : float = 0.5 # Gradient clipping value.

# AUGMENTATION OPTIONS
proba_shift : float = 0.5 # Probability of applying the shift.
shift : int = 8192 # Maximum number of samples for the shift.
proba_flip_channel : float = 1 # Probability of applying the flip left-right channels.
proba_flip_sign : float = 1 # Probability of applying the sign flip.
pitchshift_proba : float = 0.2 # Probability of applying pitch shift.
vocals_min_semitones : int = -5 # The lower limit of vocal semitones.
vocals_max_semitones : int = 5 # The upper limit of vocal semitones.
other_min_semitones : int = -2 # The lower limit of non-vocal semitones.
other_max_semitones : int = 2 # The upper limit of non-vocal semitones.
pitchshift_flag_other : bool = False # Flag to enable pitch shift augmentation on non-vocal sources.
time_change_proba : float = 0.2 # Probability of applying time stretching.
time_change_factors : tuple = (0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3) # Factors for time stretching/compression, defining the range and intensity of this augmentation.
remix_proba : float = 1 # Probability of remixing audio tracks.
remix_group_size : int = batch_size # Size of groups within which shuffling occurs.
scale_proba : float = 1 # Probability of applying the scaling.
scale_min : float = 0.25 # Minimum scaling factor.
scale_max : float = 1.25 # Maximum scaling factor.
fade_mask_proba : float = 0.1 # Probability of applying a fade effect.
double_proba : float = 0.1 # Probability of doubling one channel's audio to both channels.
reverse_proba : float = 0.2 # Probability of reversing a segment of the audio track.
mushap_proba : float = 0.0 # Probability create mashups.
mushap_depth : int = 2 # Number of tracks to mix.


@dataclass
class InferenceConfig:
GDRIVE_PREFIX = "https://drive.google.com/uc?id=" # Google Drive URL

# MODEL OPTIONS
weights_dir : Path = Path("/app/separator/inference/weights") # file name where weights are saved
weights_LSTM_filename : str = "weight_LSTM.pt" # file name model with LSTM
weights_conv_filename : str = "weight_conv.pt" # file name model without LSTM
gdrive_weights_LSTM : str = f"{GDRIVE_PREFIX}1uhAVMvW3x-KL2T2-VkjKjn9K7dTJnoyo" # Google Drive URL that directs weights LSTM
gdrive_weights_conv : str = f"{GDRIVE_PREFIX}1VO07OYbsnCuEJYRSuA8HhjlQnx6dbWX7" # Google Drive URL that directs weights without_LSTM
device : str = "cpu" # The computing platform for inference

# INFERENCE OPTIONS
segment : int = 7 # Length (in seconds) of each audio segment used during inference.
overlap : float = 0.2 # overlapping segments at the beginning of the track and at the end
offset : Union[int, None] = None # start (in seconds) of segment to split
duration : Union[int, None] = None # duration (in seconds) of segment to split, use with `offset`
sample_rate : int = 44100 # sample rate track
num_channels : int = 2 # Number of channels in the audio track
default_result_dir : str = "/app/separator/inference/output" # path file output tracks
default_input_dir : str = "/app/separator/inference/input" # path file input track

# TEST TRACK
gdrive_mix : str = f"{GDRIVE_PREFIX}1zJpyW1fYxHKXDcDH9s5DiBCYiRpraDB3" # Google Drive URL that directs test track
Loading