How to Monitor a Program on a Node
To monitor a program running on a node, you have several possibilities. In any case, your monitoring command will be attached to a job allocation
Log on the node running your job#
You also have the possibility to log on a node (using ssh) if you already have a job on it. In that case, your ssh session on that node will be attached to the allocation of [one of] the job(s) running on it.
Code Block (bash)
login@maestro-submit ~ $ sbatch -J myscript /path/to/your_script.sh
Submitted batch job 264562
login@maestro-submit ~ $ squeue -j 264562
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
264562 common myscript yourlogin R 0:05 1 maestro-1022
login@maestro-submit ~ $ ssh maestro-1022
login@maestro-1022's password:
_
_ __ ___ __ _ ___ ___| |_ _ __ ___
| '_ ` _ \ / _` |/ _ \/ __| __| '__/ _ \
| | | | | | (_| | __/\__ \ |_| | | (_) |
|_| |_| |_|\__,_|\___||___/\__|_| \___/
login@maestro-1022 ~ $
Once you are logged in, you can launch the supervision commands you want
Code Block (bash)
login@maestro-1022 ~ $ ps wwwux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
yourlogin 15367 0.0 0.0 9772 2852 ? S 15:47 0:00 /bin/sh /var/spool/slurmd/job264562/slurm_script
yourlogin 15368 98.0 0.0 4356 820 ? R 15:47 0:32 /path/to/your_script.sh
yourlogin 15374 0.0 0.0 161500 5200 ? S 15:47 0:00 sshd: yourlogin@pts/0
yourlogin 15375 0.0 0.0 20452 3636 pts/0 Ss 15:47 0:00 -bash
yourlogin 15409 0.0 0.0 54260 3656 pts/0 R+ 15:47 0:00 ps wwwux
The ssh session is automatically closes when the job is over.
Code Block (bash)
login@maestro-1022 ~ $ Connection to maestro-1022 closed by remote host.
Connection to maestro-1022 closed.
login@maestro-submit ~ $
Note that if you have several jobs on the same node, the ssh session is attached to the youngest job. As a consequence, it will be closed when the last job started on that node will be over, even if older jobs are still running on the node.
Run monitoring command from the allocation of the job#
You can also run monitoring commands from the allocation of the monitored job. You can see an example with nvidia-smi and GPUs on that page.
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false10FAQAfalsetitletruepagelabel in ("gpu","nvidia-smi","monitor","monitoring") and type = "page" and space = "FAQA"kb-how-to-article
true
| Related issues |