How to Run MPI Programs
2
mpirun instead of srun#
As said in the textbook of the SPOC, when executing an MPI program, you need to have already obtained a container with the required ressources using salloc or sbatch. The integration of MPI in SLURM allows the mpirun command to inherite the number of allocated cores as srun does. mpirun command is a kind of « master » in charge of scattering « slave » processes over the allocated cores so:
- it is executed on the submit host
- it mustn't be submitted through
srunbecause the numbernof allocated cores is then used twice: - once by
srunto submit nmpiruncommands, - once by
mpirunto submit n MPI processes.
For MPI programs, mpirun replaces srun but contrary the latter, mpirun is not able to allocate resources such as cores or memory. That's why it must be launched inside a salloc or sbatch that creates the allocation.
Prefer sbatch to salloc to submit MPI programs#
Some MPI programs show that the mpirun command can take a whole core all for itself. But we have seen that the command passed to salloc runs directly on the submit node:
salloc without srun (bash)
login@maestro-submit ~ $ salloc --mem=100M hostname
salloc: job 5714201 has been allocated resources
salloc: Granted job allocation 5714201
salloc: Waiting for resource configuration
salloc: Nodes maestro-1027 are ready for job
maestro-submit
salloc: Relinquishing job allocation 5714201
As a consequence, if a lot of users run mpirun inside a salloc, the submit node will be busy because of that.
Thus, to avoid loading maestro.pasteur.fr, use sbatch instead of salloc so that mpirun runs on one of the nodes of the container (the BatchHost). To keep the one line command way, use option --wrap option of sbatch:
sbatch mpi oneliner (bash)
login@maestro-submit ~ $ module load gcc/9.2.0 openmpi/4.0.5
login@maestro-submit ~ $ sbatch -n 3 --tasks-per-node 1 --wrap="mpirun hostname"
Submitted batch job 5612375
login@maestro-submit ~ $ cat slurm-5612375.out
maestro-1003
maestro-1010
maestro-1015
Beware of using double quotes ", in particular if you need to use loop variables as in:
use of double quote when using variables (bash)
login@maestro-submit ~ $ for i in *.fastq; do sbatch -c 4 --wrap="pigz -p \$SLURM_CPUS_PER_TASK $i"; done
Bioinformatics MPI programs that need neither srun nor mpirun#
Some programs available from module use MPI. It is especially the case of programs from the ptools package:
* mGenomeAnalysisTK
* mblastall
* mbowtie2
* mbwa
* mcutadapt
* mpbayes
* mtaxoptimizer
These programs are MPI wrappers that already contain an invocation to mpirun. Thus, they just need to be launched inside a container using salloc or sbatch as above.