How to Retrieve Technical Info (Nodes, OS, Storage)
OS system#
Nodes run under the Linux CentOS 8 operating system. You can check the minor version with the command
Code Block (bash)
login@maestro ~ $ cat /etc/redhat-release
CentOS Linux release 8.3.201
Type and number of nodes#
To have global view of the common resources available in the cluster, you can use sinfo command like this
Global view of the cluster with sinfo (bash)
login@maestro ~ $ sinfo -e -O nodes,cpus,memory,gres:30 -p common,gpu
NODES CPUS MEMORY GRES
39 96 500000 disk:915266
1 28 376000 gpu:V100:4,disk:890000
1 96 500000 gpu:A100:4,disk:890000
The option -e separate nodes by type:
- NODES: number of nodes of that type
- CPUS: number of cores
- MEMORY: RAM in MB
- GRES: general resources available on this type of nodes
Nodes are added all year long to the cluster so please use the above sinfo command to obtain up-to-date figures.
If you need an output that can be easily parsed, prefer --o / --format to -O / --Format option and add --noheader to remove the header:
Global view of the cluster with sinfo (bash)
login@maestro ~ $ sinfo -e -o %D,%c,%m,%G -p common,gpu --noheader
39,96,500000,disk:915266
1,28,376000,gpu:V100:4,disk:890000
1,96,500000,gpu:A100:4,disk:890000
Temporary local disk space on nodes#
Each node has 2 temporary disk spaces:
/tmpof 500 MB that must not be used/local/scratch/tmpof 890 GB as shown in the GRES column above
These spaces are local to each node and can't be accessed by a process running on another node.
To write temporary files on the temporary disk space of a node, you must:
- reserve that space (as you do for RAM). For that, use
--gres=disk:<size in MB>option as in
Code Block (bash)
login@maestro ~ $ srun --gres=disk:50000 fastqc -o outdir reads.fq
- remove the temporary files at the end of the process
Do not confuse with the temporary scratch space /pasteur/appa/scratch that is accessible from every nodes and from the submit node maestro.pasteur.fr.
Global number of cores#
To retrieve the global number of cores for the whole cluster, you can use the following command
global core usage (bash)
login@maestro ~ $ sinfo -O cpusstate
CPUS(A/I/O/T)
5561/1539/0/7100
the output must be read as follow:
- 5561 cores are in use (
Allocated), - 1539 are available (
Idle), - 0 are out of order or in maintenance (
Other), - the total is 7100 cores (
Total).
So here, you are only interested by the last figure T, the total number of cores.
Cluster load#
Directly on maestro.pasteur.fr, to have a very synthetic view of the core usage on a partition, let's say the common one, you can do
global core usage (bash)
login@maestro ~ $ sinfo -O cpusstate -p common
CPUS(A/I/O/T)
2589/11/0/2600
As said before,
- 2589 cores are in use (
Allocated), - 11 are available (
Idle), - 0 are out of order or in maintenance (
Other), - the total is 2600 cores (
Total).
Doing the same on the dedicated partition can help you determine in which partition you want to launch your job.
To retrieve more information, such as the state of the nodes or the memory already allocated, you can custom the output with the --Format/-O option:
sinfo extended command (bash)
login@maestro ~ $ sinfo -e -p common -O nodelist:15,partition:12,statelong:12,cpusstate:15,cpusload:12,freemem:12
Here is an excerpt of the output
Output of sinfo (bash)
NODELIST PARTITION STATE CPUS(A/I/O/T) CPU_LOAD FREE_MEM
maestro-1003 common* mixed 57/39/0/96 56.60 148902
maestro-1005 common* mixed 27/69/0/96 28.94 37561
maestro-1006 common* mixed 38/58/0/96 48.08 24258
maestro-1007 common* mixed 16/80/0/96 23.82 9356
maestro-1009 common* mixed 95/1/0/96 15.97 42351
maestro-1010 common* mixed 95/1/0/96 11.13 103762
maestro-1011 common* mixed 39/57/0/96 44.53 10950
maestro-1012 common* mixed 88/8/0/96 95.71 22515
maestro-1013 common* mixed 25/71/0/96 28.76 34791
If you want a format that you can easily parse, prefer the --format/-o option and add --noheader option. Example: to retrieve the state of CPUs from each node of the common partition, you would do:
sinfo extended command (bash)
login@maestro ~ $ sinfo -p common -N -o "%N %C %e" --noheader
When you use %N, -N option is useful to obtain the information on each node separately. The output looks like:
sinfo extended command output (bash)
maestro-1003 57/39/0/96 148902
maestro-1005 26/70/0/96 37561
maestro-1006 38/58/0/96 24258
maestro-1007 15/81/0/96 9356
maestro-1009 95/1/0/96 42351
maestro-1010 95/1/0/96 103762
maestro-1011 39/57/0/96 10950
maestro-1012 87/9/0/96 22515
maestro-1013 24/72/0/96 34791
maestro-1014 49/47/0/96 59144
Graphical view of the load#
It is possible to see in "real-time" what resources (cores, memory, nodes...) are currently being used on the cluster.
For that, go to: http://ganglia.pasteur.fr/?c=maestro-compute
Historical background of cluster usage is also available at the same link.
Project space access#
The Maestro cluster is connected to the Zeus storage by fast ethernet links (10 GB). Any project space on Zeus is potentially accessible. If not already available, send an email to informatique@pasteur.fr.
Backup policies of project spaces are accessible on the page.
Fast storage for high input/output rate programs (/pasteur/appa/scratch)scratch#
/pasteur/appa/scratch is a non-backed up temporary space of 350 TB mounted on every node of Maestro with fast access for I/O bound programs. You can create your own sub-directory on it (/pasteur/appa/scratch/<yourlogin>) to write your temporary files.
Please, do not forget to remove useless data as soon as possible.
Internet access#
Only the submission node maestro.pasteur.fr has access to the internet.
Scheduler#
The scheduler is SLURM http://slurm.schedmd.com/
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false5FAQAfalsemodifiedtruepagelabel in ("tars","cluster","load","sinfo","hpc","ganglia") and type = "page" and space = "FAQA"cluster tars load ganglia sinfo HPC
true
| Related issues |