Skip to content

scontrol to Update, Hold, or Release Pending Jobs

Checking the characteristics of a job#

Too see all the details of a running or pending job, you can use scontrol show command:

Code Block (bash)

login@maestro-submit ~ $ scontrol show job <job id>

The output looks like

Code Block (bash)

login@maestro-submit ~ $ scontrol show job 16876320

JobId=16876320 JobName=J401
   UserId=<login>(<userID>) GroupId=<GroupName>(<groupID>) MCS_label=N/A
   Priority=5458 Nice=0 Account=<account name> QOS=normal
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2017-08-09T20:16:32 EligibleTime=2017-08-09T20:16:32
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=common AllocNode:Sid=maestro-submit0:28063
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=5000,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=5000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/pasteur/appa/homes/login/script.sh
   WorkDir=/pasteur/appa/homes/login
   StdErr=/pasteur/appa/homes/login/J401.err
   StdIn=/dev/null
   StdOut=/pasteur/appa/homes/login/J401.out
   Power=

Update jobs#

Update a job#

If you made a mistake when you submitted a job and that that job is still pending, you can update it to correct you error with the scontrol update command. Indeed,

Code Block (bash)

login@maestro-submit ~ $ scontrol update job <jobid>

updates information of pending job(s) to change how slurm will schedule it/them. You can update:

  • the qos,
  • the partition,
  • the gres,
  • the licences,
  • the timelimit,
  • priority using the field nice.

For example, if you submitted your job in the default partition common in the default QoS normal (24 hours) whereas you know that you job won't take more than 30 minutes, then you can update it and change:

  • the partition to allow the job to run either in the common or in the dedicated partition,
  • the QoS to run the job in the higher priority QoS fast,
  • and even add timelimit so that the scheduler will try to launch the job in compatible windows time when cores are available.

the command then looks like

Code Block (bash)

login@maestro-submit ~ $ scontrol update jobid=<jobid> partition=common,dedicated  qos=fast  timelimit=00:30:00

Unfortunately, jobs' characteristics such as memory, cpus per taks or task number can't be updated.

If the job is a job array, then all pending jobs are modified but the running ones remain untouched.

Change the priority order of your jobs#

You can't increase the priority of a job of yours but you can lower the priority of one of your job using nice 

Code Block (bash)

$ scontrol update jobid=<jobid> nice

The scheduling priority of the job is then decreased by 100 (the default). But you can order your jobs exactly the way you want by specifying a value

Code Block (bash)

$ scontrol update jobid=<jobid> nice=<positive integer value>

Let's say that you have 2 jobs with exactly the same priority:

Code Block (bash)

login@maestro-submit ~ $  squeue -u <your login> -O jobid,name,partition,qos,state,reason,prioritylong,nice          
JOBID               NAME                PARTITION           QOS                 STATE               REASON              PRIORITY            NICE               
57191049            first               common              normal              PENDING             Resources           5511                0                  
57191050            second              common              normal              PENDING             Priority            5511                0

You can put the job called second (57191050) as the first to start by lowering the priority of the job called first. (57191049)

Code Block (bash)

login@maestro-submit ~ $ scontrol update jobid=57191049 nice
login@maestro-submit ~ $ squeue -u <login> -O jobid,name,partition,qos,state,reason,prioritylong,nice
JOBID               NAME                PARTITION           QOS                 STATE               REASON              PRIORITY            NICE     
57191050            second              common              normal              PENDING             Resources           5511                0                  
57191049            first               common              normal              PENDING             Priority            5412                100

Now the job called second has a higher priority than the job called first. As a consequence, the PENDING REASON  of the second job is Resources, meaning that it will start as soon as the resources are available, while the first job has a lower priority and so has Priority as PENDING REASON.

But you can change that starting order by lowering the priority of the second job (57191050)

Code Block (bash)

login@maestro-submit ~ $ scontrol update jobid=57191050 nice=200
login@maestro-submit ~ $ squeue -u < your login> -O jobid,name,partition,qos,state,reason,prioritylong,nice
JOBID               NAME                PARTITION           QOS                 STATE               REASON              PRIORITY            NICE    
57191049            first               common              normal              PENDING             Resources            5412                100                
57191050            second              common              normal              PENDING              Priority            5312                200

Note that, if you make a mistake, you can always correct it afterwards

Code Block (bash)

login@maestro-submit ~ $ scontrol update jobid=57191050 nice=555 
login@maestro-submit ~ $ squeue -u <login> -O jobid,name,partition,qos,state,reason,prioritylong,nice
JOBID               NAME                PARTITION           QOS                 STATE               REASON              PRIORITY            NICE               
57191049            first               common              normal              PENDING             Resources           5413                100            
57191050            second              common              normal              PENDING             None                4958                555

If you have several jobs with the same name (given with -J/--job-name ), you can change the priority of all of them at once by replacing jobid= by jobname= in the scontrol command. Example

Code Block (bash)

login@maestro-submit ~ $ squeue -u <your login> -O jobid,name,partition,qos,state,reason,prioritylong,nice --name=third
JOBID               NAME                PARTITION           QOS                 STATE               REASON              PRIORITY            NICE               
57191391            third               common              normal              PENDING             Resources           5512                0                  
57191392            third               common              normal              PENDING             Priority            5512                0

Code Block (bash)

login@maestro-submit ~ $ scontrol update jobname=third  nice=333
login@maestro-submit ~ $ squeue -u <your login> -O jobid,name,partition,qos,state,reason,prioritylong,nice --name=third
JOBID               NAME                PARTITION           QOS                 STATE               REASON              PRIORITY            NICE               
57191391            third               common              normal              PENDING             None                5179                333                
57191392            third               common              normal              PENDING             None                5179                333

Update a list of jobs#

Let's imagine that you have submitted a list of jobs in the common partition with the fast QoS. The common partition is very busy and your jobs have a low priority compared to the jobs of other users given your resource consumption over the last 7 days. You wish you had submitted them in the dedicated partition as well to maximize their chance of starting even if they could be killed and resubmitted automatically.

You can use  the squeue command to retrieve the job ids of your (-u <your login>) pending jobs (-t PD) submitted in the common partition (-p common) with fast QoS (-q fast) this way:

Code Block (bash)

login@maestro-submit ~ $ squeue -u <your login> -t PD -p common -q fast  --Format=jobid  --noheader

Note the use of:

  • --Format=jobid to only output 1 column containing the job ids,
  • --noheader option to suppress the header of the column which is only useful to humans.

That command returns 1 job id per line:

Code Block (bash)

1108497
1108499
1108493
1108495
1108496

For each of them, you want to perform the following scontrol update command:

Code Block (bash)

login@maestro-submit ~ $ scontrol update job <jobid> partition=common,dedicated

to tell Slurm that, from now on, he can launch the job with the provided job id on any node of one of these partitions (as long as the required resources specified in the original submission
command are available of course).
That command must be applied to any single job id returned by the previous squeue command. For that, use the "for" loop statement (for/do/done) to do so. One by one, each job id from the list returned by squeue will be assigned to a variable (that we choose to name jobid) and then use the content of this variable (accessible using the ${}) to build the scontrol update command. To make that command executed, put it in a "do ... done" block. In a script, you would write it this way:

Code Block (bash)

for jobid in $(squeue -u <your login> -t PD -p common -q fast --Format=jobid --noheader)
do
  scontrol update job ${jobid} partition=common,dedicated
done

But if you want to write it on a single line to do an easy copy/paste in a terminal, you would rather write it this way:

Code Block (bash)

login@maestro-submit ~ $ for jobid in $(squeue -u <yourlogin> -t PD  -p common -q fast --Format=jobid --noheader); do scontrol update job $jobid partition=common,dedicated; done

It's exactly the same but you use ";" instead of newline to separate the instructions. If you wanted to display the result of the scontrol update command, you could add another instruction such as:

Code Block (bash)

login@maestro-submit ~ $ squeue --Format="jobid,name:20,username,partition,qos,statecompact,reason,starttime" -j ${jobid} --noheader

The output looks like:

Code Block (bash)

<jobid> <job name> <your login> dedicated fast R None 2020-04-07T10:30:40

if the job can start immediately or

Code Block (bash)

<jobid> <job name> <your login> dedicated fast PD Priority 2020-04-08T00:37:00

if the job must wait (pending state PD) because of its low priority (Priority) compared to other jobs. In that case, the time displayed in the last column is the job start time in the worst case.

Inserted in the former do/done block, it looks like:

Code Block (bash)

for jobid in $(squeue -u <your login> -t PD  -p common -q fast --Format=jobid  --noheader)
do
  scontrol update job ${jobid} partition=common,dedicated
  squeue --Format="jobid,name:20,username,partition,qos,statecompact,reason,starttime" -j ${jobid} --noheader
done

The one-liner version is then:

Code Block (bash)

login@maestro-submit ~ $ for jobid in $(squeue -u <your login> --Format=jobid -t PD --noheader -p common -q fast); do scontrol update job ${jobid} partition=common,dedicated; squeue --Format="jobid,name:20,username,partition,qos,statecompact,reason,starttime" -j ${jobid} --noheader; done

Update one or more job(s) using a jobname instead of job id(s)#

You can replace the job id by the  job name. With name, all jobs with the same name are processed. If used with jobname, indicate your login as well to avoid trying to update jobs from other people with the same job name (typically wrap). Example:

Code Block (bash)

login@maestro-submit ~ $ scontrol update jobname=<job name> userid=<yourlogin> qos=fast

Hold and release jobs#

  • hold: put a lock on some specific pending jobs to prevent it from starting and let pass other jobs first. Can be used with job name or job id. With name, all jobs with the same name are prevented from starting

Code Block (bash)

login@maestro-submit ~ $ scontrol hold  name=<name of your job>

or

Code Block (bash)

login@maestro-submit ~ $ scontrol hold <jobid>

If the job is a job array, then the hold only applies on pending job array tasks. The running ones are neither suspended nor killed or requeued. This is what means No error in the output of the command for the corresponding tasks

Code Block (bash)

login@maestro-submit ~ $ scontrol hold <job array jobid>
scontrol hold <job array jobid>
<job array jobid>_2520,2536-2997,2999-4045: No error
<job array jobid>_2526,2528-2535: Job has already finished

even if  the REASON "JobHeldUser" appears in squeue output for these running tasks. Of course completed tasks are not affected by the lock.

But note that if a job array task is requeued by Slurm (due to a node failure or because the task was running on the dedicated partition), then the job array task will remain pending with REASON "JobHeldUser" in squeue output like the job array itself

Code Block (bash)

login@maestro-submit ~ $ squeue -j <job array jobid> -t pd
                        JOBID PARTITION         NAME   USER.  ST   TIME  NODES NODELIST(REASON)
<job array jobid>_[3342-4045]    common   <job name>  login   PD   0:00      1 (JobHeldUser)
  • release: to release held jobs. Can be used with job name or job id. With name, all jobs with the same name are released

Code Block (bash)

login@maestro-submit ~ $ scontrol release  name=<name of your job>

or

Code Block (bash)

login@maestro-submit ~ $ scontrol release  <jobid>

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false5FAQAfalsemodifiedtruepagelabel in ("pending","update","scontrol") and type = "page" and space = "FAQA"update pending scontrol

true

Related issues