How to Retrieve Technical Info (Nodes, OS, Storage)

OS system#

Nodes run under the Linux CentOS 8 operating system. You can check the minor version with the command

Code Block (bash)

login@maestro ~ $ cat /etc/redhat-release
CentOS Linux release 8.3.201

Type and number of nodes#

To have global view of the common resources available in the cluster, you can use sinfo command like this

Global view of the cluster with sinfo (bash)

login@maestro ~ $ sinfo -e -O nodes,cpus,memory,gres:30 -p common,gpu
NODES               CPUS                MEMORY              GRES                          
39                  96                  500000              disk:915266                   
1                   28                  376000              gpu:V100:4,disk:890000        
1                   96                  500000              gpu:A100:4,disk:890000

The option -e separate nodes by type:

NODES: number of nodes of that type
CPUS: number of cores
MEMORY: RAM in MB
GRES: general resources available on this type of nodes

Nodes are added all year long to the cluster so please use the above sinfo command to obtain up-to-date figures.

If you need an output that can be easily parsed, prefer --o / --format to -O / --Format option and add --noheader to remove the header:

Global view of the cluster with sinfo (bash)

login@maestro ~ $ sinfo -e -o %D,%c,%m,%G -p common,gpu --noheader
39,96,500000,disk:915266
1,28,376000,gpu:V100:4,disk:890000
1,96,500000,gpu:A100:4,disk:890000

Temporary local disk space on nodes#

Each node has 2 temporary disk spaces:

/tmp of 500 MB that must not be used
/local/scratch/tmp of 890 GB as shown in the GRES column above

These spaces are local to each node and can't be accessed by a process running on another node.

To write temporary files on the temporary disk space of a node, you must:

reserve that space (as you do for RAM). For that, use --gres=disk:<size in MB> option as in

Code Block (bash)

login@maestro ~ $ srun --gres=disk:50000  fastqc  -o  outdir   reads.fq

remove the temporary files at the end of the process

Do not confuse with the temporary scratch space /pasteur/appa/scratch that is accessible from every nodes and from the submit node maestro.pasteur.fr.

Global number of cores#

To retrieve the global number of cores for the whole cluster, you can use the following command

global core usage (bash)

login@maestro ~ $ sinfo -O cpusstate
CPUS(A/I/O/T)       
5561/1539/0/7100

the output must be read as follow:

5561 cores are in use (Allocated),
1539 are available (Idle),
0 are out of order or in maintenance (Other),
the total is 7100 cores (Total).

So here, you are only interested by the last figure T, the total number of cores.

Cluster load#

Directly on maestro.pasteur.fr, to have a very synthetic view of the core usage on a partition, let's say the common one, you can do

global core usage (bash)

login@maestro ~ $ sinfo -O cpusstate  -p common
CPUS(A/I/O/T)
2589/11/0/2600

As said before,

2589 cores are in use (Allocated),
11 are available (Idle),
0 are out of order or in maintenance (Other),
the total is 2600 cores (Total).

Doing the same on the dedicated partition can help you determine in which partition you want to launch your job.

To retrieve more information, such as the state of the nodes or the memory already allocated, you can custom the output with the --Format/-O option:

sinfo extended command (bash)

login@maestro ~ $ sinfo -e -p common  -O nodelist:15,partition:12,statelong:12,cpusstate:15,cpusload:12,freemem:12

Here is an excerpt of the output

Output of sinfo (bash)

NODELIST       PARTITION   STATE       CPUS(A/I/O/T)  CPU_LOAD    FREE_MEM    
maestro-1003   common*     mixed       57/39/0/96     56.60       148902      
maestro-1005   common*     mixed       27/69/0/96     28.94       37561       
maestro-1006   common*     mixed       38/58/0/96     48.08       24258       
maestro-1007   common*     mixed       16/80/0/96     23.82       9356        
maestro-1009   common*     mixed       95/1/0/96      15.97       42351       
maestro-1010   common*     mixed       95/1/0/96      11.13       103762      
maestro-1011   common*     mixed       39/57/0/96     44.53       10950       
maestro-1012   common*     mixed       88/8/0/96      95.71       22515       
maestro-1013   common*     mixed       25/71/0/96     28.76       34791

If you want a format that you can easily parse, prefer the --format/-o option and add --noheader option. Example: to retrieve the state of CPUs from each node of the common partition, you would do:

sinfo extended command (bash)

login@maestro ~ $ sinfo -p common -N -o "%N %C %e" --noheader

When you use %N, -N option is useful to obtain the information on each node separately. The output looks like:

sinfo extended command output (bash)

maestro-1003 57/39/0/96 148902
maestro-1005 26/70/0/96 37561
maestro-1006 38/58/0/96 24258
maestro-1007 15/81/0/96 9356
maestro-1009 95/1/0/96 42351
maestro-1010 95/1/0/96 103762
maestro-1011 39/57/0/96 10950
maestro-1012 87/9/0/96 22515
maestro-1013 24/72/0/96 34791
maestro-1014 49/47/0/96 59144

Graphical view of the load#

It is possible to see in "real-time" what resources (cores, memory, nodes...) are currently being used on the cluster.
For that, go to: http://ganglia.pasteur.fr/?c=maestro-compute

Historical background of cluster usage is also available at the same link.

Project space access#

The Maestro cluster is connected to the Zeus storage by fast ethernet links (10 GB). Any project space on Zeus is potentially accessible. If not already available, send an email to informatique@pasteur.fr.

Backup policies of project spaces are accessible on the page.

Fast storage for high input/output rate programs (/pasteur/appa/scratch)scratch#

/pasteur/appa/scratch is a non-backed up temporary space of 350 TB mounted on every node of Maestro with fast access for I/O bound programs. You can create your own sub-directory on it (/pasteur/appa/scratch/<yourlogin>) to write your temporary files.

Please, do not forget to remove useless data as soon as possible.

Internet access#

Only the submission node maestro.pasteur.fr has access to the internet.

Scheduler#

The scheduler is SLURM http://slurm.schedmd.com/

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false5FAQAfalsemodifiedtruepagelabel in ("tars","cluster","load","sinfo","hpc","ganglia") and type = "page" and space = "FAQA"cluster tars load ganglia sinfo HPC

true


Related issues