srun / salloc / sbatch to Submit Jobs
Job submission commands#
srun: creates an allocation with the requested resources and executes the command given in the parametersbatch: creates an allocation with the requested resources and executes the script given in the parametersalloc: creates an allocation and gives the user an interactive shell on one of the nodes.- you can close/free with
exitorlogoutcommands - do not put a command at the end of the line, it will be launched at the submit host and only produce confusion. Instead, use srun inside interactive shell.
Most commonly used options#
-J/--jobname <job name>: assign a name to the job. If not set, the command name becomes the name of the job-A/--account <account name>: if you are in several accounts, precise which one to use. Generally used in combination with--qosand-p/--partition. If not set, the default account is used.-p/--partition <partition name>: name of the partition to submit the job on. If not set, the defaultcommonpartition is used.--qos <qos name>: name of the qos to submit that job on. If not set, the defaultnormalqos is used.-t/--time <time limit>: time limit of the job if it is less than one of the qos. Useful to help SLURM to schedule your job more rapidly. Format ishh:mm:ssordd-hh:mm:ssif you submit in thelongor in units' partitions.-n/--ntasks <number of tasks>: number of tasks to be launched in parallel. The cores can be scattered over several nodes. If not set, 1 is used.-c/--cpus-per-task <number of cores>: number of cores required for each task. All cores are on the same node. If not set, 1 is used.--mem-per-cpu <memory in MB>: required memory size for each core (incompatible with--mem). If not set, the default value 4000MB is used.--mem <memory in MB>: memory size required on each node for the job (incompatible with--mem-per-cpu)-N/--nodes <minimum number of nodes[:maximum number of nodes]>: number of required nodes. If "maximum number of nodes" is not specified, it is set to the "minimum number of nodes" value.-w/--nodelist <node1,node2>: put ALL these nodes in the allocation.-x/--exclude <node1,node2>: exclude specific compute nodes (comma-separated)-C/--constraint <feature1&feature2>: list of features required. Use & to combine features.--gres=<gres name>:<value>: generic resources es to use the corresponding value (example--gres=gpu:2``--gres=disk:10000)-L/--licenses <name@licserv[:number]>: type and number of licences required. If the number is not given, the default value 1 is used. Example:-L matlab@licser:2or-L matlab@licserv-d/--dependency <after condition:jobid1:jobid2,after condition:jobid3:jobid4>: create dependency between this job and the ending state of previous ones. Most commonly used "after conditions" are :afterok,afternotok. "after conditions" are combined with commas, jobs concerned by the same "after conditions" are separated by colons.--mail-type=<STATE1,STATE2>send an email when the states are encountered. States areBEGIN,END,FAIL,REQUEUE(comma separated). UseALLif you want an email for each of them. Other states areTIME_LIMIT,TIME_LIMIT_50(50% of time limit reached), TIME_LIMIT_80(80% of time limit reached),TIME_LIMIT_90(90% of time limit reached).--mail-user=<login1,login2>: send the email to the users with the following logins (comma-separated). To use in combination with--mail-typeoption. If--mail-typeis set and--mail-useris not, the email is send to the job's owner.- --exclusive[=user|mcs]The job allocation can not share nodes with other running jobs (or just other users with the "=user" option or with the "=mcs" option). The default shared/exclusive behavior depends on system configuration and the partition's OverSubscribeoption takes precedence over the job's option.
srun only#
--x11: to be able to use a graphic interface-o/--output </path/to/output_file>: send the output to an output file instead of the terminal.-e/--error </path/to/error_file>: send the error messages to a file instead of the terminal.
sbatch only#
--array=<start>-<stop>:<step>%<running>: job array specification. The%<running>restricts the number of "array tasks" running simultaneously.--wait: remain synchronous (i.e. wait until the end of the job). Useful sometimes for array jobs.-o/--output </path/to/output_file>: send the output to a file instead of defaultslurm-<jobid>.out. Warning, in the case of job array, beware to avoid each task to write in the same file. Use%Ato indicate the job array ID, and%ato indicate the task number. Example:-o /path/to/output_file_%A_%a.out.-e/--error </path/to/error_file>: send the error messages to a file instead of defaultslurm-<jobid>.out. Warning, in the case of job array, beware to avoid each task to write in the same file. Use%Ato indicate the job array ID, and%ato indicate the task number. Example:-o /path/to/error_file_%A_%a.err.--wrap="command line": useful to launch asynchronously (without waiting for the job to complete) a simple command without creating a script. Beware of quotes when you use variables. In double quotes", the variable is replaced by its value before arrival on the execution node. Thus, if you use slurm variable such ad$SLURM_JOB_ID,$SLURM_ARRAY_TASK_ID,$SLURM_CPUS_PER_TASKand so on, do not forget to protect the$sign with a\. Example:
Code Block (bash)
login@maestro-submit ~ $ sbatch -c 4 --wrap="pigz -p \$SLURM_CPUS_PER_TASK myfile"
Other specificities of sbatch: use of partition dedicated and re-queue#
To avoid remaining PENDING too long for a short job, you are allowed to submit in the dedicated partition with the fast QoS. Nodes in dedicated belong to research units that have priority on them. Thus, if necessary, your job can be killed to let their jobs run. But if you have submitted your job with sbatch, that job is automatically requeued. So, if nodes belonging to another unit in the dedicated partition are still idle, your job can restart immediately on those nodes.
srun examples#
Once logged on maestro.pasteur.fr:
- load in your current environment on maestro
.pasteur.frthe tools you need usingmodule - launch your command
Template
Code Block (bash)
module load <tool1 name>/<version>
module load <tool2 name>/<version>
module load <tooln name>/<version>
srun [options] <your command>
Examples
Code Block (bash)
login@maestro-submit ~ $ module load SPAdes/3.7.0
login@maestro-submit ~ $ srun -c 4 spades.py -t 4 -k 21,29,37 -e error_filename -o output_filename --pe1-1 read1.fastq --pe1-2 read2.fastq --pe1-fr -o output_directory_name
Code Block (bash)
login@maestro-submit ~ $ module load matlab/R2016b
login@maestro-submit ~ $ srun --qos=fast -L matlab@licserv --x11 matlab
Data stream (redirections '>' and pipes '|')#
Beware of data streams (file redirections and pipes). This can, quite easily, harm the network on a cluster and heavily impact performance if the data stream comes back to the submit node before being written on the file system.
Use the corresponding option of the program when available to let the program handle the writing on the output file on the compute node.
For example, replace
Code Block (bash)
login@maestro-submit ~ $ srun bwa aln genome.fa reads.fq > aln.sai
with
Code Block (bash)
login@maestro-submit ~ $ srun bwa aln -f aln.sai genome.fa reads.fq
or
use sh -c '<your program with its options and arguments + the redirection or pipe>' so that the whole command (including the redirection or pipe) will be executed on the compute node.
For example, replace
Code Block (bash)
login@maestro-submit ~ $ srun grep '^>' reads.fa | wc -l
with
Code Block (bash)
login@maestro-submit ~ $ srun sh -c 'grep "^>" reads.fa | wc -l'
or with the corresponding option if it exists:
Code Block (bash)
login@maestro-submit ~ $ srun grep -c '^>' reads.fa
Use of variables in srun#
When you want to use variables in srun, pay attention to the moment when that variable must be expanded. For example, if you want to use awk/gawk's variables in a pipe, they must be expanded at the same moment as the command that awk/gawk is applied to, that is to say on the compute node where srun runs.
By default, a variable is expanded by the current shell, that is to say on the submit node. But, at that time, the command (here samtools view) hasn't started to run so the $3 (the gawk variable) is obviously empty since, in the submit node shell, it doesn't exist
Code Block (bash)
login@maestro-submit ~ $ srun sh -c "samtools view $MYSCRATCH/genome.bam 2>&1 | head -5 | gawk '{print $3}'"
To prevent a variable to be expanded on the submit node, protect it with \
Code Block (bash)
login@maestro-submit ~ $ srun sh -c "samtools view $MYSCRATCH/genome.bam 2>&1 | head -5 | gawk '{print \$3}'"
Note that for other variables already existing at submission time like $HOME, $MYSCRATCH..., you can let the shell on the submit node expand them right away so it's not necessary to protect them.
If you want to use the SLURM_CPUS_PER_TASK environment variable in srun,please use -c even if you want only 1 core per task. If you don't, the environment variable won't be set
Code Block (bash)
login@maestro-submit ~ $ srun --quiet sh -c 'echo "allocated cpus=${SLURM_CPUS_PER_TASK}"'
allocated cpus=
while
Code Block (bash)
login@maestro-submit ~ $ srun -c 1 --quiet sh -c 'echo "allocated cpus=${SLURM_CPUS_PER_TASK}"'
allocated cpus=1
sbatch examples with a single command (--wrap)#
If the previous command can be long, do not hesitate to launch it with sbatch instead of srun.
Template
Code Block (bash)
module load <tool1 name>/<version>
module load <tool2 name>/<version>
module load <tooln name>/<version>
sbatch [options] --wrap="<your command>"
Example
Code Block (bash)
login@maestro-submit ~ $ module load SPAdes/3.7.0
login@maestro-submit ~ $ sbatch -c 4 --mail-user=<yourlogin> --mail-type=begin,end,fail -e spades.err -o spade.out --wrap="spades.py -t \$SLURM_CPUS_PER_TASK -k 21,29,37 -e error_filename -o output_filename --pe1-1 read1.fastq --pe1-2 read2.fastq --pe1-fr -o output_directory_name"
Note that since you are using sbatch, you have access to SLURM environment variables such as $SLURM_CPUS_PER_TASK. Using that variable allows you to avoid taking care of the fact that the number of threads given to the program, here option -t of spades.py. Since you need to have that variable only interpreted when you are on the compute node, either simple quotes ' or variable protection with a backslash \ are required.
Launching a bunch of sbatch commands. One possibility is to make a loop around your sbatch command:
Code Block (bash)
login@maestro-submit ~ $ module load SPAdes/3.7.0
login@maestro-submit ~ $ for f in *1.fastq; do
> bn=`basename $f 1.fastq`
> sbatch --mail-user=<your login> --mail-type=begin,end,fail -e ${bn}.err -o ${bn}.out -c 4 --wrap="spades.py -t \$SLURM_CPUS_PER_TASK -k 21,29,37 --pe1-1 $f --pe1-2 ${bn}2.fastq --pe1-fr -o output_dir_${bn}"
> done
Note that in --wrap double quotes " , the variable bn is not protected because it must be interpreted before the submission on the execution node whereas $SLURM_CPUS_PER_TASK must be protected to be interpreted on the compute node only.
sbatch example with script#
If we take our previous real life example, the sbatch command would look like
Code Block (bash)
login@maestro-submit ~ $ sbatch -c 4 --mail-user=<yourlogin> --mail-type=begin,end,fail -e spades.err -o spade.out /path/to/script.sh
with script.sh containing
Code Block (bash)
#/bin/sh
source /local/gensoft2/adm/etc/profile.d/modules.sh
module purge
module load SPAdes/3.7.0
srun spades.py -t $SLURM_CPUS_PER_TASK -k 21,29,37 -e error_filename -o output_filename --pe1-1 read1.fastq --pe1-2 read2.fastq --pe1-fr -o output_directory_name
Note that SLURM variables such as $SLURM_CPUS_PER_TASK don't need to be protected inside a script since this one will only be executed on the compute node. srun inherits all the options passed to sbatch such as value of -c. You don't need to rewrite them.
Environment variables frequently used in scripts#
$SLURM_JOB_NAME: value of option-J/--job-nameor name of the executable if not given$SLURM_JOB_ID: the job id$SLURM_ARRAY_JOB_ID: job id of the array job$SLURM_ARRAY_TASK_ID: index of the current array task, value between theminandmaxgiven with--array=min:max$SLURM_CPUS_PER_TASK: value of option-c/--cpus-per-task$SLURM_NTASKSor$SLURM_NPROCS: value of option-n/--ntasks$SLURM_PROCID: rank of the current MPI process
Common mistakes#
click on the pictures to enlarge them.
Thread typical error#
-cforgotten
To run on more than 1 core, you must use -c or --cpus-per-task option:
Using srun or not in a batch script#
Use srun to improve reporting#
The usage of srun allows abetter reporting of the resource usage. Indeed, the sstat command provides real-time resource usage for processes started with srun. Each step (each call of srun) is reported individually in the accounting.
Use srun to benefit from the micro-scheduling#
If you have a lot of small jobs, rather than make an allocation for each of them, you can make an allocation of a bunch of cores and then run the small jobs inside that allocation (as steps) using srun. Example
Code Block (bash)
$ cat myscript.sh
#!/usr/bin/env bash
#SBATCH --ntasks=20
#SBATCH --cpus-per-task=1
for i in $(seq 1 100) do
srun -n 1 -c 1 -Q </path/to/my/program> <arg1> <arg2> <arg3> $i &
done
wait
exit 0
In the above script, we said that we will run 20 tasks with 1 core each. But we use these 20 tasks to run 100 steps in reality.
Note that we have to put -n 1 in the srun line. By default, the number of tasks is inherited from the sbatch option so it would be 20 while we want to use only 1 task with 1 core for each instance of the program launched through srun. That's why we put -n 1.
In the above example, we use a loop to launch all the srun commands at once but we must also avoid waiting for srun to give the prompt back. That's why we use the & at the end of the srun line. We add wait after the end of the for loop to wait for all the steps to end before terminating the sbatch script. A kind of micro-scheduling takes place inside the allocation. So when all the cores are already running a step, by default, the following error message appears in the out/error file
Code Block (text)
srun: Job <jobid> step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Job <jobid> step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Job <jobid> step creation temporarily disabled, retrying (Requested nodes are busy)
until all steps are able to start. To avoid this message that can appear a lot, add -Q option to srun as above.
Note that in the above example, the first steps to start aren't necessarily the ones with the smallest $i precisely because of that very micro-scheduling thing.
Use srun to access all resources of the allocation#
If you just want to run a multithreaded script in batch mode, srun is recommended in the batch script as said before but not mandatory since the script will be able to run on the unique allocated node. So you can give the script directly to sbatch:
But if you want to use more than one task (with option -n), that is to run more than one instance of a program at a time, you need to write a batch script and prepend srun before the call of the program. If you forget it, only 1 task will be executed. In other words, multi-task jobs require srun in their batch scripts:
Invoking sbatch in a batch script#
It is perfectly possible to invoke sbatch in a batch script. In this case, the second sbatch is completely independent from the first one: it allocates its own resources outside of the allocation created by the first one. Then, the partition, the QoS, the memory... can be totally different from the ones of the first sbatch.
Example with Nextflow#
script1 can also be a workflow/dataflow management system such as SnakeMake or NextFlow.
Imagine that you wish to submit a pipeline with NextFlow. You can:
- first, submit
nextflowitself so that it will run on 1 core on a batch host, - and then submit the steps of the workflow with the required number of cores, partition and QoS.
You would then do
Code Block (bash)
login@maestro-submit ~ $ sbatch --qos normal -p common nextflow run -c my_workflow.config my_workflow.nf
with the NextFlow config file my_workflow.config containing information for its steps submission:
Code Block (bash)
process{
executor = 'slurm'
queue= 'common,dedicated'
clusterOptions = '--qos=fast'
$mapping {
cpus = 4
}
}
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false5FAQAfalsemodifiedtruepagelabel in ("loop","srun","script","sbatch","submit","variable","salloc") and type = "page" and space = "FAQA"submit sbatch salloc srun variable loop script
true
| Related issues |