Skip to content

Dedicated partitions

Sometimes, you don't remember:

  • the number of nodes you own,
  • their characteristics

or want to know their loads or who is working on a specific node. Here are some useful commands to retrieve this kind of information.

Description of the nodes belonging to a partition#

The template of this kind of request is

partition description (text)

$ sinfo  -p  <partition name>  -e  -O  nodes,nodelist,memory,cpus,features:35,gres

The output looks like

partition description (text)

$ sinfo -p gpu -e -O nodes,nodelist,memory,cpus,features:35,gres:30
NODES               NODELIST            MEMORY              CPUS                AVAIL_FEATURES                     GRES                          
1                   maestro-3000        376000              28                  gpu,V100,intel,avx.avx2,bechtle    gpu:V100:4,disk:890000        
3                   maestro-[3002-3004] 500000              96                  gpu,A100,amd,avx2,bechtle          gpu:A100:4,disk:890000

If you need a more synthetic view with only the total number of cores, you can just have a look at the last figure of the cpusstate field:

core state of a specific partition (text)

$ sinfo -p gpu -O cpusstate    
CPUS(A/I/O/T)       
0/316/0/316

Load on a specific nodes#

For that, you can also use sinfo with different options. the template looks like

Code Block (text)


node load (text)

$ sinfo -O nodelist:12,partition:12,statelong:12,cpusstate:15,cpusload:12,allocmem:12 -Nel -n <node list>

Example:

node load (text)

$ sinfo -O nodelist:15,partition:12,statelong:12,cpusstate:15,cpusload:12,allocmem:12 -Ne -n maestro-[1021-1025] 
NODELIST       PARTITION   STATE       CPUS(A/I/O/T)  CPU_LOAD    ALLOCMEM    
maestro-1021   common*     mixed       64/32/0/96     2.07        262144      
maestro-1022   common*     mixed       64/32/0/96     2.11        262144      
maestro-1023   common*     mixed       64/32/0/96     2.09        262144      
maestro-1024   common*     mixed       64/32/0/96     2.12        262144      
maestro-1025   common*     mixed       64/32/0/96     2.14        262144
  • the STATE is the one of the node:
  • allocated means that all the cores are used
  • mixed means that only part of the cores are used
  • idle means that the node is empty
  • whereas CPUS(A/I/O/T) indicates how many cores are Allocated/Idle/Out of order + the Total number of cores of the node
  • ALLOCMEM is the total memory allocated by all the jobs running on the node

Beware, a node:

  • can be labeled with mixed,
  • but being unavailable for any other computation because all the RAM is already allocated by the running jobs.

That' why ALLOCMEM is also a useful piece of information to display in your sinfo output.

Who is running on a specific node#

To know who is running jobs on a specific node/set of nodes, use squeue and indicate the node(s) you are interested in using -w option:

jobs running on specific node(s) (text)

$ squeue -w maestro-[3002-3004]
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          42963708       gpu jupyter-    user1  R    8:41:27      1 maestro-3002
       42987153_11       gpu 300SAM4X    user2  R    4:00:52      1 maestro-3004
        42987153_8       gpu 300SAM4X    user2  R    4:47:54      1 maestro-3002
        42991187_7       gpu 600SAM4X    user2  R    2:00:10      1 maestro-3004
        42991361_4       gpu 1200_2SA    user2  R    2:02:45      2 maestro-[3002-3003]
        42991187_1       gpu 600SAM4X    user2  R    2:52:02      2 maestro-[3000,3003]
        42991187_2       gpu 600SAM4X    user2  R    2:52:02      1 maestro-3003

If you want more information, you can pass to --Format the fields you are interested in. Use :X after the name of the field to set the size of the column.

Example: if you want to know:

  • when the jobs started (START_TIME) and for how long they have been running (TIME)
  • the resources they have allocated (CPUS, NODES, MIN_MEMORY),
  • in which partition they have been submitted (yours or the dedicated one)

you can use the following template

resources allocated by running jobs on specific node(s) (text)

$ squeue   --Format=jobid:10,name:20,username:10,partition:10,qos:10,statecompact:3,numcpus:6,numnodes:6,minmemory:10,starttime:22,timeused:12,nodelist:30  -w <node list>

The output looks like

resources allocated by running jobs on specific node(s) (text)

$ squeue   --Format=jobid:10,name:20,username:10,partition:10,qos:10,statecompact:3,numcpus:6,numnodes:6,minmemory:10,starttime:22,timeused:12,nodelist:30  -w maestro-[3002-3004]
JOBID     NAME                USER      PARTITION QOS       ST CPUS  NODES MIN_MEMORYSTART_TIME            TIME        NODELIST                      
42963708  jupyter-notebook       user1  gpu       gpu       R  32    1     150G      2023-09-18T09:42:12   8:43:16     maestro-3002                  
42988946  300SAM4X               user2  gpu       gpu       R  5     1     4G        2023-09-18T14:22:47   4:02:41     maestro-3004                  
42988452  300SAM4X               user2  gpu       gpu       R  5     1     4G        2023-09-18T13:35:45   4:49:43     maestro-3002                  
42992878  600SAM4X               user2  gpu       gpu       R  5     1     4G        2023-09-18T16:23:29   2:01:59     maestro-3004                  
42992758  1200_2SAM4X            user2  gpu       gpu       R  10    2     4G        2023-09-18T16:20:54   2:04:34     maestro-[3002-3003]           
42991188  600SAM4X               user2  gpu       gpu       R  10    2     4G        2023-09-18T15:31:37   2:53:51     maestro-[3000,3003]           
42991189  600SAM4X               user2  gpu       gpu       R  5     1     4G        2023-09-18T15:31:37   2:53:51     maestro-3003

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false5FAQAfalsemodifiedtruepagelabel in ("private","partition","memory","cores","sinfo","cpu") and type = "page" and space = "FAQA"partition private sinfo CPU memory cores

true

Related issues