Dedicated partitions
Sometimes, you don't remember:
- the number of nodes you own,
- their characteristics
or want to know their loads or who is working on a specific node. Here are some useful commands to retrieve this kind of information.
Description of the nodes belonging to a partition#
The template of this kind of request is
partition description (text)
$ sinfo -p <partition name> -e -O nodes,nodelist,memory,cpus,features:35,gres
The output looks like
partition description (text)
$ sinfo -p gpu -e -O nodes,nodelist,memory,cpus,features:35,gres:30
NODES NODELIST MEMORY CPUS AVAIL_FEATURES GRES
1 maestro-3000 376000 28 gpu,V100,intel,avx.avx2,bechtle gpu:V100:4,disk:890000
3 maestro-[3002-3004] 500000 96 gpu,A100,amd,avx2,bechtle gpu:A100:4,disk:890000
If you need a more synthetic view with only the total number of cores, you can just have a look at the last figure of the cpusstate field:
core state of a specific partition (text)
$ sinfo -p gpu -O cpusstate
CPUS(A/I/O/T)
0/316/0/316
Load on a specific nodes#
For that, you can also use sinfo with different options. the template looks like
Code Block (text)
node load (text)
$ sinfo -O nodelist:12,partition:12,statelong:12,cpusstate:15,cpusload:12,allocmem:12 -Nel -n <node list>
Example:
node load (text)
$ sinfo -O nodelist:15,partition:12,statelong:12,cpusstate:15,cpusload:12,allocmem:12 -Ne -n maestro-[1021-1025]
NODELIST PARTITION STATE CPUS(A/I/O/T) CPU_LOAD ALLOCMEM
maestro-1021 common* mixed 64/32/0/96 2.07 262144
maestro-1022 common* mixed 64/32/0/96 2.11 262144
maestro-1023 common* mixed 64/32/0/96 2.09 262144
maestro-1024 common* mixed 64/32/0/96 2.12 262144
maestro-1025 common* mixed 64/32/0/96 2.14 262144
- the
STATEis the one of the node: allocatedmeans that all the cores are usedmixedmeans that only part of the cores are usedidlemeans that the node is empty- whereas
CPUS(A/I/O/T)indicates how many cores are Allocated/Idle/Out of order + the Total number of cores of the node ALLOCMEMis the total memory allocated by all the jobs running on the node
Beware, a node:
- can be labeled with
mixed, - but being unavailable for any other computation because all the RAM is already allocated by the running jobs.
That' why ALLOCMEM is also a useful piece of information to display in your sinfo output.
Who is running on a specific node#
To know who is running jobs on a specific node/set of nodes, use squeue and indicate the node(s) you are interested in using -w option:
jobs running on specific node(s) (text)
$ squeue -w maestro-[3002-3004]
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
42963708 gpu jupyter- user1 R 8:41:27 1 maestro-3002
42987153_11 gpu 300SAM4X user2 R 4:00:52 1 maestro-3004
42987153_8 gpu 300SAM4X user2 R 4:47:54 1 maestro-3002
42991187_7 gpu 600SAM4X user2 R 2:00:10 1 maestro-3004
42991361_4 gpu 1200_2SA user2 R 2:02:45 2 maestro-[3002-3003]
42991187_1 gpu 600SAM4X user2 R 2:52:02 2 maestro-[3000,3003]
42991187_2 gpu 600SAM4X user2 R 2:52:02 1 maestro-3003
If you want more information, you can pass to --Format the fields you are interested in. Use :X after the name of the field to set the size of the column.
Example: if you want to know:
- when the jobs started (
START_TIME) and for how long they have been running (TIME) - the resources they have allocated (
CPUS,NODES,MIN_MEMORY), - in which partition they have been submitted (yours or the dedicated one)
you can use the following template
resources allocated by running jobs on specific node(s) (text)
$ squeue --Format=jobid:10,name:20,username:10,partition:10,qos:10,statecompact:3,numcpus:6,numnodes:6,minmemory:10,starttime:22,timeused:12,nodelist:30 -w <node list>
The output looks like
resources allocated by running jobs on specific node(s) (text)
$ squeue --Format=jobid:10,name:20,username:10,partition:10,qos:10,statecompact:3,numcpus:6,numnodes:6,minmemory:10,starttime:22,timeused:12,nodelist:30 -w maestro-[3002-3004]
JOBID NAME USER PARTITION QOS ST CPUS NODES MIN_MEMORYSTART_TIME TIME NODELIST
42963708 jupyter-notebook user1 gpu gpu R 32 1 150G 2023-09-18T09:42:12 8:43:16 maestro-3002
42988946 300SAM4X user2 gpu gpu R 5 1 4G 2023-09-18T14:22:47 4:02:41 maestro-3004
42988452 300SAM4X user2 gpu gpu R 5 1 4G 2023-09-18T13:35:45 4:49:43 maestro-3002
42992878 600SAM4X user2 gpu gpu R 5 1 4G 2023-09-18T16:23:29 2:01:59 maestro-3004
42992758 1200_2SAM4X user2 gpu gpu R 10 2 4G 2023-09-18T16:20:54 2:04:34 maestro-[3002-3003]
42991188 600SAM4X user2 gpu gpu R 10 2 4G 2023-09-18T15:31:37 2:53:51 maestro-[3000,3003]
42991189 600SAM4X user2 gpu gpu R 5 1 4G 2023-09-18T15:31:37 2:53:51 maestro-3003
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false5FAQAfalsemodifiedtruepagelabel in ("private","partition","memory","cores","sinfo","cpu") and type = "page" and space = "FAQA"partition private sinfo CPU memory cores
true
| Related issues |