Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update XDMoD to version 11.0.0 #192

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,664 changes: 1,641 additions & 1,023 deletions database/xdmod.dump

Large diffs are not rendered by default.

36 changes: 34 additions & 2 deletions xdmod/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -555,8 +555,12 @@ Resources Added
Resource: hpc
Name: HPC
Type: hpc
Node count: 2
Processor count: 2
Resource Allocation Type: cpu
CPU Node count: 2
CPU Processor count: 2
GPU Node count: 0
GPU Processor count: 0
Resource Start Date: 2023-05-09
------------------------------------------------------------------------

Press ENTER to continue.
Expand Down Expand Up @@ -609,6 +613,12 @@ US - User Support
Gateway - Web-based access to CI resources


Available resource allocation types are:
CPU - CPU Allocated
GPU - GPU Allocated
CPUNode - CPU Node Allocated
GPUNode - GPU Node Allocated

Resource Name: ondemand
```

Expand Down Expand Up @@ -637,6 +647,28 @@ Resource Type (hpc, htc, dic, grid, cloud, vis, vm, tape, disk, stgrid, us, gate

- Type `gateway` and press the `Enter` key.

#### Resource Allocation Type

The resource allocation type is used in the Resource Specifications realm for compute resources (such as HPC, cloud
resources). This can be left at its default value for gateway resources such as OnDemand.

```shell
Resource Allocation Type (cpu, gpu, cpunode, gpunode): [cpu]
```

- Press the `Enter` key

#### Resource Start Date

The resource start date is used by the Resource Specifications realm. Set this to the date
that the resource was installed.

```shell
Resource Start Date, in YYYY-mm-dd format [2024-11-25]
```

- Type `2023-10-01` and press the `Enter` key.

#### Resource Nodes
For a general HPC-type resource you will want to enter the number of nodes the resource has here. In our case, as we're
adding a gateway resource it doesn't really have nodes or cores per se, so we'll enter 0 for both as if it were a
Expand Down
Binary file modified xdmod/conf/xdmod_etc.tar.gz
Binary file not shown.
8 changes: 2 additions & 6 deletions xdmod/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -74,18 +74,14 @@ then
sudo -u xdmod aggregate_supremm.sh

echo "---> supremm setup"
export TERMINFO=/bin/bash
export TERM=linux
/srv/xdmod/scripts/supremm.py
TERM=linux /srv/xdmod/scripts/supremm.py

echo "---> XDMoD Open OnDemand module setup"
expect /srv/xdmod/scripts/xdmod-setup-ondemand.tcl | col -b

echo "---> XDMoD Open OnDemand ingest historical data"
chown hpcadmin:xdmod -R /scratch/ondemand/logs
sudo -u xdmod xdmod-ondemand-ingestor -r ondemand -u https://localhost:3443 -d /scratch/ondemand/logs
sudo -u xdmod xdmod-ingestor
sudo -u xdmod xdmod-ondemand-ingestor -r ondemand -u https://localhost:3443 -d /scratch/ondemand/logs
sudo -u xdmod xdmod-ondemand-ingestor -r ondemand -d /scratch/ondemand/logs

echo "---> Make sure we have a place to keep our backups"
mkdir -p /srv/xdmod/backups
Expand Down
10 changes: 4 additions & 6 deletions xdmod/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,10 @@ dnf install -y \
# be installed in the same container. In a production deployment they may be installed
# on separate hosts.
#------------------------
dnf install -y https://github.com/ubccr/xdmod/releases/download/v10.0.2-2-el8/xdmod-10.0.2-2.0.el8.noarch.rpm \
https://github.com/ubccr/xdmod-ondemand/releases/download/v10.0.0/xdmod-ondemand-10.0.0-1.0.beta1.el8.noarch.rpm \
https://github.com/ubccr/xdmod-supremm/releases/download/v10.0.1-rc.1/xdmod-supremm-10.0.1-1.0.rc01.el8.noarch.rpm

# supremm rpm has broken deps so we force install the rpm and install the deps via pip
rpm --nodeps -ivh https://github.com/ubccr/supremm/releases/download/2.0.0-beta3/supremm-2.0.0-1.0_beta3.el8."$ARCHTYPE".rpm
dnf install -y https://github.com/ubccr/xdmod/releases/download/v11.0.0-1.0/xdmod-11.0.0-1.0.el8.noarch.rpm \
https://github.com/ubccr/xdmod-ondemand/releases/download/v11.0.0-1.0/xdmod-ondemand-11.0.0-1.0.el8.noarch.rpm \
https://github.com/ubccr/xdmod-supremm/releases/download/v11.0.0-1.0/xdmod-supremm-11.0.0-1.0.el8.noarch.rpm \
https://github.com/ubccr/supremm/releases/download/2.0.0/supremm-2.0.0-1.el8.${ARCHTYPE}.rpm

#------------------------
# The Job Performance software uses MongoDB to store the job-level performance
Expand Down
21 changes: 10 additions & 11 deletions xdmod/scripts/supremm.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@

def main():

scriptsettings = ['start', 'start', 'start', 'end', 'submit']

with open("supremm_expect_log", "wb") as f:
p = pexpect.spawn('supremm-setup')
p.logfile = f
Expand All @@ -24,13 +22,14 @@ def main():

while True:
i = p.expect(["Overwrite config file", "hpc", pexpect.EOF, pexpect.TIMEOUT])
if i > 1:
if i == 0:
p.sendline("y")
break
elif i == 1:
p.expect('Enable SUPReMM summarization for this resource?')
if i > 5:
p.sendline("n")
continue
p.sendline("y")
if i != 0:
p.sendline("y")
p.expect("Data collector backend \(pcp or prometheus\)")
p.sendline("pcp")
p.expect("Directory containing node-level PCP archives")
p.sendline("/home/pcp")
p.expect("Source of accounting data")
Expand All @@ -40,9 +39,9 @@ def main():
p.expect("Directory containing job launch scripts")
p.sendline()
p.expect("Job launch script timestamp lookup mode \('submit', 'start' or 'none'\)")
p.sendline(scriptsettings[i-1])
else:
break
p.sendline('start')
elif i > 1:
p.sendline("n")

p.expect("Press ENTER to continue")
p.sendline()
Expand Down
8 changes: 5 additions & 3 deletions xdmod/scripts/xdmod-setup-jobs.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
set resources [list]

# Job Resources
lappend resources [list hpc HPC hpc 2 2]
lappend resources [list hpc HPC hpc cpu 2016-01-01 2 2]
# -------------

#-------------------------------------------------------------------------------
Expand All @@ -32,8 +32,10 @@ foreach resource $resources {
provideInput {Resource Name:} [lindex $resource 0]
provideInput {Formal Name:} [lindex $resource 1]
provideInput {Resource Type*} [lindex $resource 2]
provideInput {How many nodes does this resource have?} [lindex $resource 3]
provideInput {How many total processors (cpu cores) does this resource have?} [lindex $resource 4]
provideInput {Resource Allocation Type*} [lindex $resource 3]
provideInput {Resource Start Date*} [lindex $resource 4]
provideInput {How many CPU nodes does this resource have?} [lindex $resource 5]
provideInput {How many total CPU processors (cpu cores) does this resource have?} [lindex $resource 6]
}

selectMenuOption s
Expand Down
8 changes: 5 additions & 3 deletions xdmod/scripts/xdmod-setup-ondemand.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ selectMenuOption 4
selectMenuOption 1
provideInput {Resource Name:} ondemand
provideInput {Formal Name:} {Open OnDemand Instance}
provideInput {Resource Type*} Gateway
provideInput {How many nodes does this resource have?} 0
provideInput {How many total processors (cpu cores) does this resource have?} 0
provideInput {Resource Type*} gateway
provideInput {Resource Allocation Type*} cpu
provideInput {Resource Start Date*} 2016-01-01
provideInput {How many CPU nodes does this resource have?} 0
provideInput {How many total CPU processors (cpu cores) does this resource have?} 0

selectMenuOption s
confirmFileWrite yes
Expand Down