-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vasp GPU (single node) podman #4
base: main
Are you sure you want to change the base?
Conversation
Includes build & run information Note: "wget" targets will need to be updated once this is checked into main at https://github.com/ubccr/ccr-examples Tony
Tony
Vasp podman
Tony
Fixed wget URLs
Made build script executable
Added copying the VASP tarball to ${SLURMTMPDIR}
Make fix_nvhpc_.pc_files.bash script executable
32 process parallel builds requires 32 cores in the salloc command. Tony
Update salloc for parallel build
Fix typos
Add output with the path to the sample Slurm script
Fix (another!) typo
Includes build & run information Note: "wget" targets will need to be updated once this is checked into main at https://github.com/ubccr/ccr-examples Tony
Vasp podman
${LOGNAME} may not be set, or may be changed, whereas $(id -un) will always return the username Tony
17b5a15
to
60d479f
Compare
Vasp podman
commit 60d479f Merge: ba1abe3 91e6dec Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 15:49:34 2024 -0400 Merge branch 'main' into VASP-podman commit ba1abe3 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 15:48:05 2024 -0400 Use "id -un" for the username rather than the LOGNAME variable ${LOGNAME} may not be set, or may be changed, whereas $(id -un) will always return the username Tony commit 91e6dec Merge: 8617fb9 4d60a5f Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 09:54:27 2024 -0400 Merge pull request #11 from tonykew/VASP-podman Vasp podman commit 4d60a5f Merge: 45ab638 8617fb9 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 09:52:16 2024 -0400 Merge branch 'main' into VASP-podman commit 45ab638 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:30:59 2024 -0400 Initial check in of VASP-GPU-podman example Includes build & run information Note: "wget" targets will need to be updated once this is checked into main at https://github.com/ubccr/ccr-examples Tony commit 8617fb9 Merge: 8f7a600 9a31f57 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:44:05 2024 -0400 Merge pull request #10 from tonykew/VASP-podman Fix (another!) typo commit 9a31f57 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:42:58 2024 -0400 Fix (another!) typo Tony commit 8f7a600 Merge: 1c0f4d8 0add716 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:42:19 2024 -0400 Merge pull request #9 from tonykew/VASP-podman Fix typo commit 0add716 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:41:30 2024 -0400 Fix typo Tony commit 1c0f4d8 Merge: e7ee61c 1abbe2d Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:31:07 2024 -0400 Merge pull request #8 from tonykew/VASP-podman Add output with the path to the sample Slurm script commit 1abbe2d Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:29:51 2024 -0400 Add output with the path to the sample Slurm script Tony commit e7ee61c Merge: 30ac9f2 3f69eb9 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:15:39 2024 -0400 Merge pull request #7 from tonykew/VASP-podman Fix typos commit 3f69eb9 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 02:14:54 2024 -0400 Fix typos Tony commit 30ac9f2 Merge: 726c196 9b9df0c Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 01:24:15 2024 -0400 Merge pull request #6 from tonykew/VASP-podman Update salloc for parallel build commit 9b9df0c Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Thu Sep 19 01:21:33 2024 -0400 Update salloc for parallel build 32 process parallel builds requires 32 cores in the salloc command. Tony commit 726c196 Merge: b61401a b03025f Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 19:17:21 2024 -0400 Merge pull request #5 from tonykew/VASP-podman Make fix_nvhpc_.pc_files.bash script executable commit b03025f Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 19:15:51 2024 -0400 Make fix_nvhpc_.pc_files.bash script executable Tony commit b61401a Merge: ec5024c b1a8270 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:56:57 2024 -0400 Merge pull request ubccr#4 from tonykew/VASP-podman Added copying the VASP tarball to ${SLURMTMPDIR} commit b1a8270 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:55:15 2024 -0400 Added copying the VASP tarball to ${SLURMTMPDIR} Tony commit ec5024c Merge: 5fdda10 68f63ad Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:51:34 2024 -0400 Merge pull request ubccr#3 from tonykew/VASP-podman Made build script executable commit 68f63ad Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:50:43 2024 -0400 Made build script executable Tony commit 5fdda10 Merge: 22f6ee0 1380cbd Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:48:34 2024 -0400 Merge pull request ubccr#2 from tonykew/VASP-podman Fixed wget URLs commit 1380cbd Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:45:38 2024 -0400 Fixed wget URLs Tony commit 22f6ee0 Merge: 8c5caec a4bcf15 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:41:53 2024 -0400 Merge pull request ubccr#1 from tonykew/VASP-podman Vasp podman commit a4bcf15 Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:37:26 2024 -0400 Fixed typos Tony commit 40c8ddd Author: Tony Kew <tony.d.r.kew@gmail.com> Date: Wed Sep 18 18:30:59 2024 -0400 Initial check in of VASP-GPU-podman example Includes build & run information Note: "wget" targets will need to be updated once this is checked into main at https://github.com/ubccr/ccr-examples Tony
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are sort of my initial comments from validating the build pipeline. My meta question is do we need three example run files? I think having a containerfile, build file, and slurm file should be more than plenty for users. I personally don't think we should support any other kinds of runs with scripts, if they want to run VASP GPU locally with a shell I say we leave this up to them. Please correct me if I'm misunderstanding the scope of the run scripts.
data_dir="/projects/academic/tonykew/tonykew/VASP/data" | ||
############################################################################## | ||
|
||
echo "-------------------------------------------------------------------------------" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment kinda pertains to this file. I have concerns that there is just too much going on for a user. As a user, if I were looking for a slurm script to try, I would probably keep looking until I found an extremely simple example which I didn't have to read closely to understand.
My concern is that this script has far too many echo and conditional statements to check for directory structure and print out job info. I would rather an example of "hey this is how you can print out a ton of job info in a slurm script" example live somewhere else, uncoupled to trying to run VASP.
I don't think we should check to see if they have all the correct files when running VASP, just to have one less thing that will fail quietly. I'd rather VASP throw it's errors, rather than us check for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes I've made have resolved this?
fi | ||
|
||
# run vasp_std in the data directory | ||
vasp_std |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer keeping this line, and maybe a one-line comment about directory and file structure so a user can easily identify which lines are running and which lines are failing. Otherwise, we're going to get help desk tickets to try and troubleshoot these scripts, which defeats the purpose of having an example repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only sections the user needs to change (beyond the SLURM directives)
are clearly marked now.
I think these changes resolve this?
Text and variable name for the OCI image targall directory changed Sample directory paths changed to follow the CCR standard path naming conventions Added a check for the podman binary - bail if missing Tidied some logic for the sample data directory path Renamed the sample SLURM script to sample_SLURM_vasp_std.bash Tony
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some formatting comments and some questions
slurm/2_ApplicationSpecific/VASP-GPU-podman/sample_SLURM_vasp_std.bash
Outdated
Show resolved
Hide resolved
slurm/2_ApplicationSpecific/VASP-GPU-podman/sample_SLURM_vasp_std.bash
Outdated
Show resolved
Hide resolved
then | ||
default_acct="$(sacctmgr -rnp show User "$(id -un)" | awk -F'|' '{print $2}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe what this section and the next are intended for? If the Slurm script exists then an account is included in it and a data path should be set in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are referring to this section:
if [ -f sample_SLURM_vasp_std.bash ]
then
default_acct="$(sacctmgr -rnp show User "$(id -un)" | awk -F'|' '{print $2}')"
sed -E -i -e "/#SBATCH[[:space:]]+--account=/s|\".*\"|\"${default_acct}\"|" \
-e "/#SBATCH[[:space:]]+--chdir=/s|\".*\"|\"${base_dir}\"|" \
-e "s|^bin_dir=.*|bin_dir=\"${bin_dir}\"|" \
"sample_SLURM_vasp_std.bash"
# try to provide a reasonable data dir path
grp_group="$(groups | sed 's/ /\n/g' | grep ^grp- | head -1 | sed 's/^grp-//')"
if [ "${grp_group}" != "" ]
then
if [ -d "/projects/academic/${grp_group}" ]
then
base_dir="/projects/academic/${grp_group}/$(id -un)/VASP"
sed -E -i -e "s|^data_dir=.*|data_dir=\"${base_dir}/data\"|" \
"sample_SLURM_vasp_std.bash"
fi
fi
cp "sample_SLURM_vasp_std.bash" "${bin_dir}/sample_SLURM_vasp_std.bash"
else
echo "sample script \"sample_SLURM_vasp_std.bash\" missing?" >&2
fi
This substitutes the users installed OCI directory in bin directory path into
the SLURM sample script, set the default SLURM account to the user's
group & sets sample paths based on the user's username & group, and copies
this sample script to the user's script dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly. Thanks for the explanation. I personally think this is too much for the examples repo. I think the simpler the better, less to break and less to explain to users. However, others may have a different opinion.
Removed references to the individual back end scripts Trimmed down the sample SLURM script Tony
Replaced "ccrgroup" with "[YourGroupName]" Replaced "ccruser" with "[CCRusername]" Tony
Instructions to build VASP with Nvidia GPU support in podman
and a sample Slurm script to run the podman tarball image.
Tony