Skip to content

sacct Check State & Resource Usage of Ended Jobs

Most commonly used options#

  • -j <jobid1,jobid2> : display the state of given jobs only
  • -D : for jobs that have been requeued, display information for the first run too
  • -S <starttime> : display the state of jobs started after that date (format mm/dd or mm/dd/yy or hh:mm)
  • -E <endtime> : display the state of jobs ended before that date (format mm/dd or mm/dd/yy or hh:mm)
  • --state=<STATE1,STATE2> : display only the jobs with these states (comma-separated). Most common states are  CA or  CANCELED,  CD or COMPLETEDF or FAILEDPD or  PENDINGR or  RUNNINGTO or  TIMEOUT
  • --partition <partition name> : display the state of jobs submitted in that partition
  • --qos <qos name> : display the state of jobs submitted in that qos
  • --format=<field1,field2> : display the specific fields only (comma-separated)

Examples#

Display all today's jobs submitted in dedicated partition with the fast qos. In the specified output format, ended, running and pending jobs are displayed:

Code Block (bash)

login@maestro-submit ~ $ sacct --partition=dedicated --qos=fast  --format=jobid,jobname,user,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist

 JobID          JobName      User  Partition        QOS     State                Start                End    Elapsed  ExitCode      NCPUS        NodeList 
------------ ---------- --------- ---------- ---------- ---------- ------------------- ------------------- ---------- -------- ---------- --------------- 
17131680          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:22:49   00:15:50      0:0         12        maestro-1008 
17131682          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:29:19   00:22:20      0:0         12        maestro-1049 
17131683          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:20:28   00:13:29      0:0         12        maestro-1057 
17131684       otherjob     user1  dedicated       fast     FAILED 2017-08-10T15:06:59 2017-08-10T15:12:38   00:05:39      1:0          2        maestro-1058 
17131685       otherjob     user1  dedicated       fast     FAILED 2017-08-10T15:06:59 2017-08-10T15:21:51   00:14:52      1:0          2        maestro-1059

Display all jobs that ran between the 8th and 10th of August and that failed:

Code Block (bash)

login@maestro-submit ~ $ sacct --state=FAILED -S 08/08 -E 08/10

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
16394545         mummer     common       fast          1     FAILED      2:0 
16394545.ba+      batch                  fast          1     FAILED      2:0 
16401727     prokka_14+     common       fast          4     FAILED      2:0 
16401727.ba+      batch                  fast          4     FAILED      2:0

Display all jobs that ran between the 15th and 17th of November and that failed or were cancelled. Add the required memory (ReqMem) and the maximum Resident Set Size (MaxRSS) to check if the jobs tried to exceed the allocated memory. Depending on when they tried to do it:

  • either SLURM cancelled them by sending a SIGTERM (ExitCode 15) to the programs run by these jobs,
  • or the OS was faster (because there was no memory left on the node so it had to sacrifice a process at once) and sent them a SIGKILL (ExitCode 9) to provoke immediate termination of the programs run by these jobs.

Code Block (bash)

login@maestro-submit ~ $ sacct --state=F,CA -S 11/15 -E 11/17 --format=jobid,jobname,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist,reqmem,maxrss
      JobID    JobName   Partition        QOS      State               Start    Elapsed                 End ExitCode      NCPUS   NNodes        NodeList     ReqMem     MaxRSS 
------------ ---------- ---------- ---------- ---------- ------------------- ---------- ------------------- -------- ---------- -------- --------------- ---------- ---------- 
38697769     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:04   00:14:01 2019-11-15T10:43:05      0:0          1        1        maestro-1021        4Gc            
38697769.ba+      batch                           FAILED 2019-11-15T10:29:04   00:14:03 2019-11-15T10:43:07     15:0          1        1        maestro-1021        4Gc   6392132K 
38697769.ex+     extern                        COMPLETED 2019-11-15T10:29:04   00:14:01 2019-11-15T10:43:05      0:0          1        1        maestro-1021        4Gc        84K 
38697782     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:26   00:13:39 2019-11-15T10:43:05      0:0          1        1        maestro-1020        4Gc            
38697782.ba+      batch                           FAILED 2019-11-15T10:29:26   00:13:40 2019-11-15T10:43:06     15:0          1        1        maestro-1020        4Gc   5257816K 
38697782.ex+     extern                        COMPLETED 2019-11-15T10:29:26   00:13:39 2019-11-15T10:43:05      0:0          1        1        maestro-1020        4Gc        88K 
38697789     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:33   00:13:32 2019-11-15T10:43:05      0:0          1        1        maestro-1020        4Gc            
38697789.ba+      batch                           FAILED 2019-11-15T10:29:33   00:13:34 2019-11-15T10:43:07     15:0          1        1        maestro-1020        4Gc   8647140K 
38697789.ex+     extern                        COMPLETED 2019-11-15T10:29:33   00:13:33 2019-11-15T10:43:06      0:0          1        1        maestro-1020        4Gc        84K 
38697801     nf-Clean_+     common     normal     FAILED 2019-11-15T10:30:01   00:06:15 2019-11-15T10:36:16      9:0          1        1        maestro-1019        4Gc            
38697801.ba+      batch                           FAILED 2019-11-15T10:30:01   00:06:15 2019-11-15T10:36:16      9:0          1        1        maestro-1019        4Gc   7339244K 
38697801.ex+     extern                        COMPLETED 2019-11-15T10:30:01   00:06:15 2019-11-15T10:36:16      0:0          1        1        maestro-1019        4Gc        84K 
38697805     nf-Clean_+     common     normal     FAILED 2019-11-15T10:30:02   00:07:08 2019-11-15T10:37:10      9:0          1        1        maestro-1019        4Gc            
38697805.ba+      batch                           FAILED 2019-11-15T10:30:02   00:07:08 2019-11-15T10:37:10      9:0          1        1        maestro-1019        4Gc   7860676K 
38697805.ex+     extern                        COMPLETED 2019-11-15T10:30:02   00:07:09 2019-11-15T10:37:11      0:0          1        1        maestro-1019        4Gc        88K

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false5FAQAfalsemodifiedtruepagelabel in ("state","usage","resource","sacct") and type = "page" and space = "FAQA"sacct state resource usage

true

Related issues