Skip to content

How to Monitor a Program on a Node

To monitor a program running on a node, you have several possibilities. In any case, your monitoring command will be attached to a job allocation

Log on the node running your job#

You also have the possibility to log on a node (using ssh) if you already have a job on it. In that case, your ssh session on that node will be attached to the allocation of [one of] the job(s) running on it.

Code Block (bash)

login@maestro-submit ~ $ sbatch -J myscript /path/to/your_script.sh
Submitted batch job 264562
login@maestro-submit ~ $ squeue -j 264562
             JOBID PARTITION     NAME      USER       ST      TIME  NODES NODELIST(REASON)
            264562    common     myscript  yourlogin  R       0:05      1 maestro-1022
login@maestro-submit ~ $ ssh maestro-1022
login@maestro-1022's password: 
                           _             
 _ __ ___   __ _  ___  ___| |_ _ __ ___  
| '_ ` _ \ / _` |/ _ \/ __| __| '__/ _ \ 
| | | | | | (_| |  __/\__ \ |_| | | (_) |
|_| |_| |_|\__,_|\___||___/\__|_|  \___/ 

login@maestro-1022 ~ $

Once you are logged in, you can launch the supervision commands you want

Code Block (bash)

login@maestro-1022 ~ $ ps wwwux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
yourlogin  15367  0.0  0.0   9772  2852 ?        S    15:47   0:00 /bin/sh /var/spool/slurmd/job264562/slurm_script
yourlogin  15368 98.0  0.0   4356   820 ?        R    15:47   0:32 /path/to/your_script.sh
yourlogin  15374  0.0  0.0 161500  5200 ?        S    15:47   0:00 sshd: yourlogin@pts/0
yourlogin  15375  0.0  0.0  20452  3636 pts/0    Ss   15:47   0:00 -bash
yourlogin  15409  0.0  0.0  54260  3656 pts/0    R+   15:47   0:00 ps wwwux

The ssh session is automatically closes when the job is over.

Code Block (bash)

login@maestro-1022 ~ $ Connection to maestro-1022 closed by remote host.
Connection to maestro-1022 closed.
login@maestro-submit ~ $

Note that if you have several jobs on the same node, the ssh session is attached to the youngest job. As a consequence, it will be closed when the last job started on that node will be over, even if older jobs are still running on the node.

Run monitoring command from the allocation of the job#

You can also run monitoring commands from the allocation of the monitored job. You can see an example with nvidia-smi  and GPUs on that page.

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false10FAQAfalsetitletruepagelabel in ("gpu","nvidia-smi","monitor","monitoring") and type = "page" and space = "FAQA"kb-how-to-article

true

Related issues