- Download the data sample from: http://repo.pi.ingv.it/instance/Instance_sample_dataset_v2.tar.bz2
- Create an environment variable file, called
.env
at the root of the project. This is to set thedata
andoutput
paths. Change the variable value to your actual data and output location. The folder and files should exist before you run the code.
- EVENT_HDF5_FILE="data/instance_samples/Instance_events_counts_10k.hdf5"
- EVENT_METADATA_FILE="data/instance_samples/metadata_Instance_events_10k.csv"
- NOISE_HDF5_FILE="data/instance_samples/Instance_noise_1k.hdf5"
- NOISE_METADATA_FILE="data/instance_samples/metadata_Instance_noise_1k.csv"
- FINAL_OUTPUT_DIR="output"
- TEMP_DIR="temp"
- Create a
venv
environment using python 3.11 and install the packages in therequirements.txt
found at the root of the project. Use this environment to run the code.
- Run
python train_my_eq.py
from the root of the project - Check progress in the terminal. At the end, the result will be added to the
output
file mentionned in the.env
file.
- Run
python train_my_cnn1.py
orpython train_my_cnn2.py
- Check progress in the terminal. At the end, the result will be added to the
output
file mentionned in the.env
file.
- Copy one of the file
train_my_eq.py
and edit it to change the model and hyperparameters - Train your model by raining
python train_my_own_model.py
Instance data have been downloaded and available here: ~/projects/def-sponsor00/earthquake/data/instance
STEAD data are not yet downloaded but can be added here: ~/projects/def-sponsor00/earthquake/data/stead
You want to downlaod in parallel for STEAD:
- put all your the urls in a file (files.txt)
- and do:
cat files.txt | xargs -n 1 -P 0 wget -q
-P 0 let xargs choose the number of parallels work. You can assign a hard number if you want
- Use
tmux
to run your session. Once connected on the server:
- Type
tmux
to start a new session ortmux attach
to recover from an old session. I suggest using it as tmux will keep your terminal session running even if you loose connection. Otherwise you might need to start from scratch - Type
ctrl-b + %
to split your screen andctrl-b <arrow>
to navigate through the panes. I use it to be able to run simultaneously multiple terminals as one might be blocked by a long running task. - Use
ctrl-b + z
to toggle one pane full screen or not - Type
exit
to close tmux pane
- Using the terminal, login to your cluster, preferrably through ssh (e.g: ssh username@ift6759.calculquebec.cloud)
- Create a folder where you will clone your repo:
mkdir documents
- Get into the folder and clone the repo or pull if already cloned before (using ssh preferrably):
cd documents
git clone git@github.com:damoursm/earthquake.git
ORgit pull
- Get into the scripts folder and run the setup script. That will create sbatch and scratch folder and move code and scripts to proper location
cd scripts
./setup.sh
- Go back to home Add your
.env
file in the code foldercd ~
vim scratch/code-snapshots/earthquake/.env
- Set the variable as in the above section. You can leave all variables except
FINAL_OUTPUT_DIR
with empty values (""
) as code is detecting cluster and selecting where the code is. - Set
FINAL_OUTPUT_DIR="scratch/<your username>/output/default-train"
.default-train
is used as default but you can change it if you want to save output of different experiments. Just make sure the folder exists
- Keep in the home folder and start training. Replace
train_transformer_elisee.py
with the file containing the code (For EqModel, usetrain_my_eq.py
and for Cnn, usetrain_my_cnn.py
)cd ~
./sbatch/run.sh -p train_transformer_elisee.py
. Here you can optionally specifiy few arguments.-m 16Gb
for memory (by default8Gb
).-t hh:mm:ss
for how long to run (by default 1H).-p /train_xxx.py
for the file to execute (by default it will run train.py).-c 1
for the number of cpu to use.g 1
for the number of gpu to use.
- Once training is done, the files will be in the
FINAL_OUTPUT_DIR
specified in the.env
. To download them 1 by 1 on your local computer, use this command line:scp <username>@ift6759.calculquebec.cloud:/scratch/<username>/output/default-train/<filename> <local path e.g /Users/ekabore/Downloads>
- Useful slurm commands:
- squeue -u username : will show the current job being submitted
- scontrol show job jobid : show details about the job
- scancel jobid : cancels a job
- Start the mlflow server by running
mlflow server
in the terminal - Fill you hyperparameters and configuration in the config file config.py
- Activate the environment earthquake
- Run the script
python main.py
- You can access the MLflow experiment in the UI by going to
http://localhost:5000
in your browser