reportseff Generate Resource Usage Reports
reportseff is a wrapper around the Slurm tool sacct. It allows to get efficiency metrics of Slurm jobs. It is like , but it allows to query multiple jobs at once and get GPU accounting as well.
Get information of an user#
The common usage of reportseff is to query a user. By default, it will query the user to get all the jobs over the last week.
Code Block (text)
[login@maestro-submit ~]$ reportseff -u <yourlogin>
JobID State Elapsed TimeEff CPUEff MemEff
13542903 CANCELLED 00:00:06 0.0% 25.0% 0.4%
13421907 COMPLETED 00:00:04 0.0% --- 0.0%
[...]
13543902 COMPLETED 00:02:41 0.2% 98.8% 31.5%
13548702 CANCELLED 00:00:05 0.0% 60.0% 74.4%
13548703 COMPLETED 00:01:01 0.1% 96.7% 97.8%
13841472 RUNNING 23:30:23 97.9% --- ---
Columns with '---' value, are due to:
- The job is not long enough to get a correct accounting. All jobs with less than 10 seconds runtime are not taken into account.
- The job is currently RUNNING or PENDING.
By default, it will auto-detect if the output can be displayed with colors.
If you don't want to have any color in the output, it is possible to remove the color and avoid the pipe by adding the option --no-color :
Code Block (text)
[login@maestro-submit ~]$ reportseff -u <yourlogin> --no-color
Get information during for a time range#
It is also possible to query a specific time range with --since and --until (the corresponding --start and --end option of sacct). Multiple time formats are supported:
sacctstyle
Code Block (text)
[login@maestro-submit ~]$ reportseff --user <yourlogin> --since 03/13 --until 03/14
- a new format, by asking the last X hours/last X days/last X week with the option
--sinceand the value h=X/d=X/w=X. To get all braffest user jobs of the last hour it goes like this:
Code Block (text)
[login@maestro-submit ~]$ reportseff --user braffest --since h=1
JobID State Elapsed TimeEff CPUEff MemEff
13977670 CANCELLED 00:45:40 0.0% 88.1% 0.4%
13982486 COMPLETED 00:36:29 0.0% 2.7% 9.5%
13986303 RUNNING 00:01:05 0.0% --- ---
Warning, if the query is too large, it can lead to an incomplete result. Be as precise as possible (user, partition, etc) and avoid large date ranges!
Get information of a partition#
It's possible to query efficiency metrics for a whole partition with the --partition option (--since or --user is mandatory in that case)
For exemple to display all common partition jobs from the last hour, it goes like this:
Code Block (text)
[login@maestro-submit ~]$ reportseff --partition common --since h=1
JobID State Elapsed TimeEff CPUEff MemEff
10661528 RUNNING 23-12:52:50 6.4% --- ---
13839966 TIMEOUT 1-00:00:25 100.0% 99.5% 5.4%
13841472 RUNNING 23:30:23 97.9% --- ---
[...]
13850916 RUNNING 17:32:21 73.1% --- ---
Get information of jobs depending of file output#
reportseff can get all the jobs depending of the slurm out log. By default, it will check the current directory and check for the default log file name pattern : slurm_%j.out. It is changeable with the option --slurm-format.
As we see below, there are 6 jobs, with the default file name pattern. Without putting any argument, it will check for files in the current directory and query to the slurm database to get the efficiency of the jobs.
Code Block (text)
[braffest@maestro-submit output_slurm]$ ls
slurm-13988085.out slurm-13988099.out slurm-13988105.out slurm-13988107.out slurm-13988112.out slurm-13988113.out
[braffest@maestro-submit output_slurm]$ reportseff
JobID State Elapsed TimeEff CPUEff MemEff
slurm-13988085.out FAILED 00:00:00 0.0% --- 0.0%
slurm-13988099.out COMPLETED 00:01:00 0.1% 95.0% 0.0%
slurm-13988105.out COMPLETED 00:01:00 0.1% 99.2% 0.0%
slurm-13988107.out COMPLETED 00:01:00 0.1% 66.1% 0.0%
slurm-13988112.out COMPLETED 00:01:00 0.1% 49.6% 0.0%
slurm-13988113.out COMPLETED 00:01:01 0.1% 24.4% 0.0%
Custom format of output#
It is possible to custom the output format of reportseff. To do that, you just need to use the --format option and specify the desired fields.
Code Block (text)
[login@maestro-submit ~]$ reportseff -u braffest --format=jobid,cpueff
JobID CPUEff
13421907 ---
13421913 ---
13421915 ---
13429970 ---
13430056 ---
13430478 8.3%
13430482 12.5%
13440364 60.0%
It is also possible to add extra fields to the default output by adding the --format=+ option.
Code Block (text)
[login@maestro-submit ~]$ reportseff -u braffest --format=+jobname,start
JobID State Elapsed TimeEff CPUEff MemEff JobName Start
13421907 COMPLETED 00:00:04 0.0% --- 0.0% interactive 2023-03-08T19:27:43
13421913 COMPLETED 00:00:01 0.0% --- 0.0% wrap 2023-03-08T19:29:20
13421915 COMPLETED 00:00:00 0.0% --- 0.0% wrap 2023-03-08T19:29:30
13429970 CANCELLED 00:00:04 0.0% --- 0.0% sleep 2023-03-09T01:12:07
13430056 CANCELLED 00:00:06 0.0% --- 0.0% sleep 2023-03-09T01:12:47
13430478 COMPLETED 00:00:24 0.0% 8.3% 1.5% [RStudio Launcher] Session ed9efcfc78a3162b13abb (braffest) - RStudio Pro Session 2023-03-09T01:41:56
13430482 COMPLETED 00:00:16 0.0% 12.5% 1.4% [RStudio Launcher] Session ed9efcfc78a31e1504b98 (braffest) - RStudio Pro Session 2023-03-09T01:42:29
GPU accounting#
reportseff is able to report the usage of the GPU card and its dedicated memory for slurm job by specifying the -g option.
In the following example, the job used 8 GPUs on two different nodes with different efficiency depending on the gpu card type of the node.
Code Block (text)
[braffest@maestro-submit ~]$ reportseff 13979617 -g
JobID State Elapsed TimeEff CPUEff MemEff GPUEff GPUMem
13979617 COMPLETED 00:01:06 0.0% 46.4% 64.8% 58.5% 80.2%
maestro-3003 46.4% 64.8% 93.3% 88.9%
1 94% 88.9%
2 95% 88.9%
3 91% 88.9%
maestro-3013 46.4% 64.8% 23.6% 71.4%
0 25% 88.9%
1 32% 88.9%
2 31% 88.9%
3 30% 88.9%
7 0% 1.3%
As we see, on maestro-3013 (A40 GPU cards), the GPU card used were the first, the second, the third, the fourth and the eigth.
For this job, the global GPUEff was 58.5% since the efficiency was lower on the A40 cards than on the A100 cards of maestro-3003. The GPUMem was 80.2%, mostly the same whatever the type of card.
Note that GPUEff and GPUMem are not updated at the end of the job but up to 5 minutes later.
Code Block (text)
For further information on reportseff, see the tool project page at https://github.com/troycomi/reportseff