Skip to content

Commit

Permalink
Merge branch 'main' into VASP-podman
Browse files Browse the repository at this point in the history
  • Loading branch information
tonykew committed Sep 19, 2024
2 parents 45ab638 + 8617fb9 commit 4d60a5f
Show file tree
Hide file tree
Showing 30 changed files with 208 additions and 7 deletions.
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,32 @@
# CCR Examples

Example HPC job scripts for CCR.
This repository contains examples for use on [UB CCR's](https://buffalo.edu/ccr) high performance computing clusters. These should be used in conjuction with [CCR's documentation](https://docs.ccr.buffalo.edu) - where the concepts and policies for using CCR's systems are explained. As a supplement to the documentation, there are recorded workshops on a variety of topics available on [CCR's YouTube channel](https://youtube.com/@ubccr) and an [Intro to CCR course](https://ublearns.buffalo.edu/d2l/home/209035) in UB Learns.

## Examples
This repo is updated regularly though there is always a chance information contained herein is inaccurate. Please report any issues by filing a [bug report](https://github.com/ubccr/ccr-examples/issues/new), or even better if you can correct the error, file a pull request with a fix! Please note that although you will see some application specific Slurm scripts or container instructions, this repo does not contain an example for every piece of software installed on CCR's systems. These application specific examples are available because they require special settings or data. Users are expected to use the examples in the introductory section to modify for their application usage as appropriate. If you have questions or run into problems using these examples, please submit a ticket to CCR Help, rather than submit an issue in Github.

## Example Directories

- slurm/ - Example slurm jobs
- scripts/ - Misc example scripts
- scripts/ - Misc example scripts (coming soon!)
- containers/ - Examples for using containers (coming soon!)

## How to use the examples

TODO: write me
Login to a CCR login node or use the terminal app in OnDemand to access a login node.
In your $HOME or group's project directory, clone this repo.
```
git clone https://github.com/ubccr/ccr-examples.git
cd ccr-examples
```
Navigate to the directory with the example that you'd like to use and copy that script to your working directory. Modify the script as appropriate for your workflow or applications.


## Coding style, tips and conventions

- Keep examples organized in respective per example directories
- Do not include large data sets. Scripts should use ENV variables to specify
path to data/suppl files.
path to data/suppl files.
- Use $SLURM variables to specify Slurm specific information (i.e. $SLURM_JOB_ID, $SLURM_NPROCS, $SLURM_NODEFILE, $SLURMTMPDIR, $SLURM_SUBMIT_DIR, etc )

## License

Expand Down
11 changes: 11 additions & 0 deletions slurm/0_Introductory/faculty/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Example faculty cluster job

This is an example of how to setup a slurm job on the faculty cluster. Substitute `partition_name` for the partition you'd like to run your job on and ensure that this same name is used in the `--qos` line. To see what access you have to faculty partitions, please view your allocations in [ColdFront](https://coldfront.ccr.buffalo.edu).

## How to use

TODO: write me

## How to launch an interactive job on the faculty cluster

Use the `salloc` command and the same Slurm directives as you use in a batch script to request an interactive job session. Please refer to our [documentation](https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/#interactive-job-submission) for proper setup of the request and command to use to access the allocated node.
13 changes: 13 additions & 0 deletions slurm/0_Introductory/faculty/example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash -l

#SBATCH --clusters=faculty
#SBATCH --partition=partition_name
#SBATCH --qos=partition_name
#SBATCH --time=12:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=64000




11 changes: 11 additions & 0 deletions slurm/0_Introductory/ub-hpc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Example ub-hpc cluster job

These are examples of how to setup a slurm job on the debug and general-compute partitions of the ub-hpc cluster. Refer to our documentation on [requesting cores and nodes](https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/#requesting-cores-and-nodes) to understand these options.

## How to use

TODO: write me

## How to launch an interactive job on the ub-hpc cluster

Use the `salloc` command and the same Slurm directives as you use in a batch script to request an interactive job session. Please refer to our [documentation](https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/#interactive-job-submission) for proper setup of the request and command to use to access the allocated node.
13 changes: 13 additions & 0 deletions slurm/0_Introductory/ub-hpc/debug.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash -l

#SBATCH --clusters=ub-hpc
#SBATCH --partition=debug
#SBATCH --qos=debug
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=64G




13 changes: 13 additions & 0 deletions slurm/0_Introductory/ub-hpc/general-compute.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash -l

#SBATCH --clusters=ub-hpc
#SBATCH --partition=general-compute
#SBATCH --qos=general-compute
#SBATCH --time=12:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=64G




8 changes: 8 additions & 0 deletions slurm/1_Advanced/JobArrays/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Example script for job arrays

TODO: write me

## How to use

TODO: write me

7 changes: 7 additions & 0 deletions slurm/1_Advanced/JobArrays/example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash -l






8 changes: 8 additions & 0 deletions slurm/1_Advanced/Scavenger/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Example scavenger job

TODO: write me

## How to use

TODO: write me

7 changes: 7 additions & 0 deletions slurm/1_Advanced/Scavenger/example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash -l






File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ This runs the full NiftyPET juptyer notebook demo in a slurm job.

1. Download raw data files to your working directory:

``
```
$ mkdir niftypet-demo
$ cd niftypet-demo
$ wget -O amyloidPET_FBP_TP0_extra.zip 'https://zenodo.org/records/1472951/files/amyloidPET_FBP_TP0.zip?download=1'
``
```

2. Unzip data:

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
18 changes: 18 additions & 0 deletions slurm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Example Slurm scripts

These are examples of how to setup a slurm job on CCR's clusters. Refer to our documentation on [running and monitoring jobs](https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/) for detailed information. These examples supplement the documentation. It's important to understand the concepts of batch computing and CCR's specific cluster use and limits prior to using these examples.

## How to use

The `slurm-options.sh` file in this directory provides a list of the most commonly used Slurm directives and a short explanation for each one. It is not necessary to use all of these directives in every job script. In the sample scripts throughout this repository, we list the required Slurm directives and a few others just as examples. Refer to the `slurm-options.sh` file for a more complete list of directives and also to our [documentation](https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/#slurm-directives-partitions-qos) for specific cluster and partition limits. Know that the more specific you get when requesting resources on CCR's clusters, the fewer options the job scheduler has to place your job. When possible, it's best to only specify what you need to and let the scheduler do it's job. If you're unsure what resources your program will require, we recommend starting small and [monitoring the progress](https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/#monitoring-jobs) of the job, then you can scale up.

At CCR you should use the bash shell for your Slurm scripts; you'll see this on the first line of every example we share. In a bash script, anything after the `#` is considered a comment and is not interpretted when the script is run. In the case of Slurm scripts though, the Slurm scheduler is specifically looking for lines that start with `#SBATCH` and will interpret those as requests for your job. Do NOT remove the `#` in front of the `SBATCH` command or your batch script will not work properly. If you don't want Slurm to look at a particular `SBATCH` line in your script, put two `#` in front of the line.

## Navigating these directories

- `0_Introductory` - contains beginner batch scripts for the ub-hpc and faculty clusters
- `1_Advanced` - contains batch scripts for more complicated use cases such as job arrays, parallel computing, and using the scavenger partition
- `2_ApplicationSpecific` - contains batch scripts for a variety of applications that have special setup requirements. You will not find an example script for every piece of software installed on CCR's systems



81 changes: 81 additions & 0 deletions slurm/slurm-options.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/bash -l
##
## How long do you want to reserve the node(s) for? By default, if you don't specify,
## you will get 24 hours. Referred to as walltime, this is how long the job will be
## scheduled to run for once it begins. If your program runs longer than what is
## requested here, the job will be cancelled by Slurm when time runs out.
## If you make the expected time too long, it may take longer for resources to
## become available and for the job to start. The various partitions in CCR's
## clusters have various maximum walltimes. Refer to the documentation for more info.
## Walltime Format: dd:hh:mm:ss
#SBATCH --time=00:01:00

## Define how many nodes you need. We ask for 1 node
#SBATCH --nodes=1

## Refer to docs on proper usage of next 3 Slurm directives https://docs.ccr.buffalo.edu/en/latest/hpc/jobs/#requesting-cores-and-nodes
## Number of "tasks" (use with distributed parallelism)
#SBATCH --ntasks=12

## Number of "tasks" per node (use with distributed parallelism)
#SBATCH --ntasks-per-node=12

## Number of CPUs allocated to each task (use with shared memory parallelism)
#SBATCH --cpus-per-task=32

## Specify the real memory required per node. Default units are megabytes.
## Different units can be specified using the suffix [K|M|G|T]
#SBATCH --mem=20G

## Give your job a name, so you can recognize it in the queue
#SBATCH --job-name="example-debug-job"

## Tell slurm the name of the file to write to. If not specified, output files are named output.log and output.err
#SBATCH --output=example-job.out
#SBATCH --error=example-job.err

## Tell slurm where to send emails about this job
#SBATCH --mail-user=myemailaddress@institution.edu

## Tell slurm the types of emails to send.
## Options: NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-type=end

## Tell Slurm which cluster, partition and qos to use to schedule this job.
#SBATCH --cluster=ub-hpc
OR
#SBATCH --cluster=faculty

## Refer to documentation on what partitions are available and determining what you have access to
#SBATCH --partition=[partition_name]

## QOS usually matches partition name but some users have access to priority boost QOS values.
#SBATCH --qos=[qos]

## Request exclusive access of the node you're assigned, even if you haven't requested all of the node's resources.
## This prevents other users' jobs from running on the same node as you. Only recommended if you're having trouble
## with network bandwidth and sharing the node is causing problems for your job.
#SBATCH --exclusive

## Use snodes command to see node tags used to allow for requesting specific types of hardware
## such as specific GPUs, CPUs, high speed networks, or rack locations.
#SBATCH --constraint=[Slurm tag]

## Multiple options for requesting GPUs
## Request GPU - refer to snodes output for breakdown of node capabilities
#SBATCH --gpus-per-node=1

## Request a specific type of GPU
#SBATCH --gpus-per-node=1
#SBATCH --constraint=V100

## Request a specific GPU & GPU memory configuration
#SBATCH --gpus-per-node=tesla_v100-pcie-32gb:1

## Request a specific GPU, GPU memory, and GPU slot location
#SBATCH --gpus-per-node=tesla_v100-pcie-16gb:1(S:0) or (S:1)

## To use all cores on a node w/more than 1 GPU you must disable CPU binding
#SBATCH --gres-flags=disable-binding

## For more Slurm directives, refer to the Slurm documentation https://slurm.schedmd.com/documentation.html

0 comments on commit 4d60a5f

Please sign in to comment.