sacct Check State & Resource Usage of Ended Jobs
Most commonly used options#
-j <jobid1,jobid2>: display the state of given jobs only-D: for jobs that have been requeued, display information for the first run too-S <starttime>: display the state of jobs started after that date (format mm/dd or mm/dd/yy or hh:mm)-E <endtime>: display the state of jobs ended before that date (format mm/dd or mm/dd/yy or hh:mm)--state=<STATE1,STATE2>: display only the jobs with these states (comma-separated). Most common states areCAorCANCELED,CDorCOMPLETED,ForFAILED,PDorPENDING,RorRUNNING,TOorTIMEOUT--partition <partition name>: display the state of jobs submitted in that partition--qos <qos name>: display the state of jobs submitted in that qos--format=<field1,field2>: display the specific fields only (comma-separated)
Examples#
Display all today's jobs submitted in dedicated partition with the fast qos. In the specified output format, ended, running and pending jobs are displayed:
Code Block (bash)
login@maestro-submit ~ $ sacct --partition=dedicated --qos=fast --format=jobid,jobname,user,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist
JobID JobName User Partition QOS State Start End Elapsed ExitCode NCPUS NodeList
------------ ---------- --------- ---------- ---------- ---------- ------------------- ------------------- ---------- -------- ---------- ---------------
17131680 myjob user1 dedicated fast COMPLETED 2017-08-10T15:06:59 2017-08-10T15:22:49 00:15:50 0:0 12 maestro-1008
17131682 myjob user1 dedicated fast COMPLETED 2017-08-10T15:06:59 2017-08-10T15:29:19 00:22:20 0:0 12 maestro-1049
17131683 myjob user1 dedicated fast COMPLETED 2017-08-10T15:06:59 2017-08-10T15:20:28 00:13:29 0:0 12 maestro-1057
17131684 otherjob user1 dedicated fast FAILED 2017-08-10T15:06:59 2017-08-10T15:12:38 00:05:39 1:0 2 maestro-1058
17131685 otherjob user1 dedicated fast FAILED 2017-08-10T15:06:59 2017-08-10T15:21:51 00:14:52 1:0 2 maestro-1059
Display all jobs that ran between the 8th and 10th of August and that failed:
Code Block (bash)
login@maestro-submit ~ $ sacct --state=FAILED -S 08/08 -E 08/10
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
16394545 mummer common fast 1 FAILED 2:0
16394545.ba+ batch fast 1 FAILED 2:0
16401727 prokka_14+ common fast 4 FAILED 2:0
16401727.ba+ batch fast 4 FAILED 2:0
Display all jobs that ran between the 15th and 17th of November and that failed or were cancelled. Add the required memory (ReqMem) and the maximum Resident Set Size (MaxRSS) to check if the jobs tried to exceed the allocated memory. Depending on when they tried to do it:
- either SLURM cancelled them by sending a
SIGTERM(ExitCode 15) to the programs run by these jobs, - or the OS was faster (because there was no memory left on the node so it had to sacrifice a process at once) and sent them a
SIGKILL(ExitCode 9) to provoke immediate termination of the programs run by these jobs.
Code Block (bash)
login@maestro-submit ~ $ sacct --state=F,CA -S 11/15 -E 11/17 --format=jobid,jobname,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist,reqmem,maxrss
JobID JobName Partition QOS State Start Elapsed End ExitCode NCPUS NNodes NodeList ReqMem MaxRSS
------------ ---------- ---------- ---------- ---------- ------------------- ---------- ------------------- -------- ---------- -------- --------------- ---------- ----------
38697769 nf-Clean_+ common normal CANCELLED+ 2019-11-15T10:29:04 00:14:01 2019-11-15T10:43:05 0:0 1 1 maestro-1021 4Gc
38697769.ba+ batch FAILED 2019-11-15T10:29:04 00:14:03 2019-11-15T10:43:07 15:0 1 1 maestro-1021 4Gc 6392132K
38697769.ex+ extern COMPLETED 2019-11-15T10:29:04 00:14:01 2019-11-15T10:43:05 0:0 1 1 maestro-1021 4Gc 84K
38697782 nf-Clean_+ common normal CANCELLED+ 2019-11-15T10:29:26 00:13:39 2019-11-15T10:43:05 0:0 1 1 maestro-1020 4Gc
38697782.ba+ batch FAILED 2019-11-15T10:29:26 00:13:40 2019-11-15T10:43:06 15:0 1 1 maestro-1020 4Gc 5257816K
38697782.ex+ extern COMPLETED 2019-11-15T10:29:26 00:13:39 2019-11-15T10:43:05 0:0 1 1 maestro-1020 4Gc 88K
38697789 nf-Clean_+ common normal CANCELLED+ 2019-11-15T10:29:33 00:13:32 2019-11-15T10:43:05 0:0 1 1 maestro-1020 4Gc
38697789.ba+ batch FAILED 2019-11-15T10:29:33 00:13:34 2019-11-15T10:43:07 15:0 1 1 maestro-1020 4Gc 8647140K
38697789.ex+ extern COMPLETED 2019-11-15T10:29:33 00:13:33 2019-11-15T10:43:06 0:0 1 1 maestro-1020 4Gc 84K
38697801 nf-Clean_+ common normal FAILED 2019-11-15T10:30:01 00:06:15 2019-11-15T10:36:16 9:0 1 1 maestro-1019 4Gc
38697801.ba+ batch FAILED 2019-11-15T10:30:01 00:06:15 2019-11-15T10:36:16 9:0 1 1 maestro-1019 4Gc 7339244K
38697801.ex+ extern COMPLETED 2019-11-15T10:30:01 00:06:15 2019-11-15T10:36:16 0:0 1 1 maestro-1019 4Gc 84K
38697805 nf-Clean_+ common normal FAILED 2019-11-15T10:30:02 00:07:08 2019-11-15T10:37:10 9:0 1 1 maestro-1019 4Gc
38697805.ba+ batch FAILED 2019-11-15T10:30:02 00:07:08 2019-11-15T10:37:10 9:0 1 1 maestro-1019 4Gc 7860676K
38697805.ex+ extern COMPLETED 2019-11-15T10:30:02 00:07:09 2019-11-15T10:37:11 0:0 1 1 maestro-1019 4Gc 88K
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false5FAQAfalsemodifiedtruepagelabel in ("state","usage","resource","sacct") and type = "page" and space = "FAQA"sacct state resource usage
true
| Related issues |