-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbatch-task.jemdoc
executable file
·57 lines (46 loc) · 2.44 KB
/
batch-task.jemdoc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# jemdoc: menu{MENU}{batch-task.html}
== Batch Task
To submit a batch task on the cluster, it requires a SLURM script to clarify the requested resources and the programs to run.
=== Check Available Resource
Before the submission, please check the compute nodes' status to insure it/they has/have enough resource to run the upcoming jobs.
~~~
{resource}{shell}
user_name@Control:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
CPU-Compute-1* up 1-00:00:00 1 idle Compute2030005000
CPU-Compute-2 up 7-00:00:00 1 idle Compute2030005001
GPU-Compute-1 up 7-00:00:00 1 mix Compute2030005002
GPU-Compute-2 up infinite 1 alloc Compute2030005003
~~~
As the example shown, +sinfo+ can roughly shows the information of all the partitions.
- If the STATE column is "idle", it means the node is in an idle state
- If the STATE column is "alloc", it means the node is full
- If the STATE column is "mix", it means that the node is occupied by someone, but it still has remaining resources
You can also use the +pestat+ to chekc hte detail of each node:
~~~
{resource with pestat}{shell}
user_name@Control:~$ pestat
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (MB) (MB) JobId User ...
Compute2030005000 CPU-Compute-1* idle 0 40 1.11* 128000 35643
Compute2030005001 CPU-Compute-2 idle 0 40 0.11 128000 34578
Compute2030005002 GPU-Compute-1 mix 2 40 16.30 128000 37085 1422 user_2
Compute2030005003 GPU-Compute-2 alloc 40 40 35.50* 128000 29748 1454 user_1
~~~
=== Check Task's Status
After the submission, you can use +squeue+ to check the status of your task:
~~~
{task status}{shell}
user_name@Control:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1454 GPU-Compu test user_1 R 22-17:04:56 1 Compute2030005003
1422 GPU-Compu debug user_2 R 01:30 1 Compute2030005002
~~~
- +squeue -l+: can show detailed information
- +squeue -u user_name+: only show user_name's task
- +squeue -t state+: only shows the task with "state" status
To cancel task:
- +scancel jobid+: cancel the task with JobID = jobid
- +scancel -u user_name+: cancel all the tasks belong to user_name
- +scancel -s state+: Delete all the tasks with "state" status
User can only cancel its only task, not others!