Skip to content

reportseff Generate Resource Usage Reports

reportseff is a wrapper around the Slurm tool sacct. It allows to get efficiency metrics of Slurm jobs. It is like , but it allows to query multiple jobs at once and get GPU accounting as well.

Get information of an user#

The common usage of reportseff is to query a user. By default, it will query the user to get all the jobs over the last week.

Code Block (text)

[login@maestro-submit ~]$ reportseff -u <yourlogin>
     JobID    State       Elapsed  TimeEff   CPUEff   MemEff 
  13542903  CANCELLED    00:00:06   0.0%     25.0%     0.4%
  13421907  COMPLETED    00:00:04   0.0%      ---      0.0%  
  [...] 
  13543902  COMPLETED    00:02:41   0.2%     98.8%    31.5%
  13548702  CANCELLED    00:00:05   0.0%     60.0%    74.4%  
  13548703  COMPLETED    00:01:01   0.1%     96.7%    97.8%
  13841472   RUNNING     23:30:23   97.9%     ---      ---

Columns with '---' value, are due to:

  • The job is not long enough to get a correct accounting. All jobs with less than 10 seconds runtime are not taken into account.
  • The job is currently RUNNING or PENDING.

By default, it will auto-detect if the output can be displayed with colors.

If you don't want to have any color in the output, it is possible to remove the color and avoid the pipe by adding the option --no-color :

Code Block (text)

[login@maestro-submit ~]$ reportseff -u <yourlogin> --no-color

Get information during for a time range#

It is also possible to query a specific time range with --since and --until (the corresponding --start and --end option of sacct). Multiple time formats are supported:

  • sacct style

Code Block (text)

[login@maestro-submit ~]$ reportseff --user <yourlogin> --since 03/13 --until 03/14
  • a new format, by asking the last X hours/last X days/last X week with the option --since and the value  h=X/d=X/w=X. To get all braffest user jobs of the last hour it goes like this:

Code Block (text)

[login@maestro-submit ~]$ reportseff --user braffest --since h=1
     JobID    State       Elapsed  TimeEff   CPUEff   MemEff 
  13977670  CANCELLED    00:45:40   0.0%     88.1%     0.4%  
  13982486  COMPLETED    00:36:29   0.0%      2.7%     9.5%  
  13986303   RUNNING     00:01:05   0.0%      ---      ---

Warning, if the query is too large, it can lead to an incomplete result. Be as precise as possible (user, partition, etc) and avoid large date ranges!

Get information of a partition#

It's possible to query efficiency metrics for a whole partition with the --partition option (--since  or --user is mandatory in that case)
For exemple to display all common partition jobs from the last hour, it goes like this:

Code Block (text)

[login@maestro-submit ~]$ reportseff --partition common --since h=1
    JobID    State          Elapsed  TimeEff   CPUEff   MemEff 
  10661528   RUNNING     23-12:52:50   6.4%      ---      ---
  13839966   TIMEOUT      1-00:00:25  100.0%    99.5%     5.4%  
  13841472   RUNNING        23:30:23   97.9%     ---      ---    
  [...]
  13850916   RUNNING        17:32:21   73.1%     ---      ---

Get information of jobs depending of file output#

reportseff can get all the jobs depending of the slurm out log. By default, it will check the current directory and check for the default log file name pattern : slurm_%j.out. It is changeable with the option --slurm-format.

As we see below, there are 6 jobs, with the default file name pattern. Without putting any argument, it will check for files in the current directory and query to the slurm database to get the efficiency of the jobs.

Code Block (text)

[braffest@maestro-submit output_slurm]$ ls
slurm-13988085.out  slurm-13988099.out  slurm-13988105.out  slurm-13988107.out  slurm-13988112.out  slurm-13988113.out
[braffest@maestro-submit output_slurm]$ reportseff 
               JobID    State       Elapsed  TimeEff   CPUEff   MemEff 
  slurm-13988085.out   FAILED      00:00:00   0.0%      ---      0.0%  
  slurm-13988099.out  COMPLETED    00:01:00   0.1%     95.0%     0.0%  
  slurm-13988105.out  COMPLETED    00:01:00   0.1%     99.2%     0.0%  
  slurm-13988107.out  COMPLETED    00:01:00   0.1%     66.1%     0.0%  
  slurm-13988112.out  COMPLETED    00:01:00   0.1%     49.6%     0.0%  
  slurm-13988113.out  COMPLETED    00:01:01   0.1%     24.4%     0.0%

Custom format of output#

It is possible to custom the output format of reportseff. To do that, you just need to use the --format option and specify the desired fields.

Code Block (text)

[login@maestro-submit ~]$ reportseff  -u braffest --format=jobid,cpueff
  JobID     CPUEff 
 13421907    ---
 13421913    ---
 13421915    ---
 13429970    ---
 13430056    ---
 13430478    8.3%  
 13430482   12.5%  
 13440364   60.0%

It is also possible to add extra fields to the default output by adding the --format=+ option.

Code Block (text)

[login@maestro-submit ~]$ reportseff  -u braffest --format=+jobname,start
    JobID    State       Elapsed  TimeEff   CPUEff   MemEff                                        JobName                                               Start        
  13421907  COMPLETED    00:00:04   0.0%      ---      0.0%                                       interactive                                      2023-03-08T19:27:43
  13421913  COMPLETED    00:00:01   0.0%      ---      0.0%                                          wrap                                          2023-03-08T19:29:20
  13421915  COMPLETED    00:00:00   0.0%      ---      0.0%                                          wrap                                          2023-03-08T19:29:30
  13429970  CANCELLED    00:00:04   0.0%      ---      0.0%                                          sleep                                         2023-03-09T01:12:07
  13430056  CANCELLED    00:00:06   0.0%      ---      0.0%                                          sleep                                         2023-03-09T01:12:47
  13430478  COMPLETED    00:00:24   0.0%      8.3%     1.5%    [RStudio Launcher] Session ed9efcfc78a3162b13abb (braffest) - RStudio Pro Session   2023-03-09T01:41:56
  13430482  COMPLETED    00:00:16   0.0%     12.5%     1.4%    [RStudio Launcher] Session ed9efcfc78a31e1504b98 (braffest) - RStudio Pro Session   2023-03-09T01:42:29

GPU accounting#

reportseff is able to report the usage of the GPU card and its dedicated memory for slurm job by specifying the -g option.

In the following example,  the job used 8 GPUs on two different nodes with different efficiency depending on the gpu card type of the node.

Code Block (text)

[braffest@maestro-submit ~]$ reportseff 13979617 -g
JobID                 State         Elapsed  TimeEff   CPUEff   MemEff   GPUEff   GPUMem 
13979617            COMPLETED      00:01:06    0.0%    46.4%    64.8%    58.5%    80.2%  
  maestro-3003                                         46.4%    64.8%    93.3%    88.9%  
    1                                                                     94%     88.9%  
    2                                                                     95%     88.9%  
    3                                                                     91%     88.9%  
  maestro-3013                                         46.4%    64.8%    23.6%    71.4%  
    0                                                                     25%     88.9%  
    1                                                                     32%     88.9%  
    2                                                                     31%     88.9%  
    3                                                                     30%     88.9%  
    7                                                                      0%      1.3%

As we see, on maestro-3013 (A40 GPU cards), the GPU card used were the first, the second, the third, the fourth and the eigth.

For this job, the global GPUEff was 58.5% since the efficiency was lower on the A40 cards than on the A100 cards of maestro-3003. The GPUMem was 80.2%, mostly the same whatever the type of card.

Note that GPUEff and GPUMem are not updated at the end of the job but up to 5 minutes later.

Code Block (text)


For further information on reportseff, see the tool project page at https://github.com/troycomi/reportseff