Nextflow
NextFlow is a powerful pipeline design system with its own Domain-Specific-Language (DSL). The full documentation is provided here.
While on Maestro there is module NextFlow we recommend using your own installation as follows:
Install NextFlow (bash)
(0)-(kpetrov@maestro-submit:/pasteur/helix/projects/hpc/kpetrov)->module load Python/3.10.13 graalvm/ apptainer/
(0)-(kpetrov@maestro-submit:/pasteur/helix/projects/hpc/kpetrov)->virtualenv NextFlow
created virtual environment CPython3.10.13.final.0-64 in 2665ms
creator CPython3Posix(dest=/pasteur/helix/projects/hpc/kpetrov/NextFlow, clear=False, no_vcs_ignore=False, global=False)
<snip>
(0)-(kpetrov@maestro-submit:~)->source NextFlow/bin/activate
(NextFlow)-(kpetrov@maestro-submit:~)->pip install nextflow
<snip>
Successfully installed nextflow-24.10.3
(NextFlow)-(kpetrov@maestro-submit:~)->nextflow -v
nextflow version 24.10.3.5933
Note:
If you happen to have another Nextflow install in your PATH, the bash PATH caching mechanism could cause the wrong version (your pre-existing one) to be called instead of the new pip-installed one. In such a case, when creating the virtual env, please don't run nextflow between the virtual env activation and Nextflow pip installation OR run (for bash) hash -r after the virtual env activation (this holds true in general, not only for Nextflow).
Also, we load the apptainer module which replaces singularity on Maestro.
To make your installation of NextFlow more efficient you should add these lines to your .profile file. The first line is very important at it will force the proper execution of jobs. The second line deal with memory limits for Java and the last ones prevent having many containers for each of the steps in each of your working directories.
Code Block (bash)
export NXF_EXECUTOR=slurm
export NXF_OPTS="-Xms500M -Xmx4G"
export NXF_SINGULARITY_CACHEDIR="$APPASCRATCH/$USER/work/nxf_singularity_cache"
export NXF_WORK="$APPASCRATCH/$USER/work/"
Unfortunately NextFlow fails to authenticate with our Internet Proxy service, and if you want it to connect to GitHub (nf-core pipelines, for example) you have to do the following on maestro-submit:
Code Block (bash)
unset HTTP_PROXY https_proxy http_proxy HTTPS_PROXY
Make sure that the variable NXF_EXECUTOR is set to 'slurm'
Code Block (bash)
(NextFlow) (0)-(kpetrov@maestro-submit:/pasteur/helix/projects/hpc/kpetrov/NextFlow)->echo $NXF_EXECUTOR
slurm
Now you can run a test workflow.
demo pipeline (bash)
nextflow run nf-core/demo -profile maestro --input /pasteur/appa/scratch/input.csv --outdir testflow
Note the 'maestro' profile being specified. It contains most of the information about our cluster. Generally you should download the pipeline description and verify that parameters are set optimally. Ask us if unsure. Here is what that profile sets:
maestro.config (js)
params {
config_profile_description = 'Institut Pasteur Maestro cluster profile'
config_profile_url = 'https://research.pasteur.fr/en/equipment/maestro-compute-cluster/'
config_profile_contact = 'ask-hpc@pasteur.fr'
}
singularity {
enabled = true
autoMounts = true
runOptions = '--home $HOME:/home/$USER --bind /pasteur'
}
process {
withLabel:download {
executor = 'local'
}
}
profiles {
normal {
process {
executor = 'slurm'
scratch = false
queue = 'common'
queueSize = 20
clusterOptions = '--qos=normal'
}
params {
max_memory = 500.GB
max_cpus = 96
max_time = 24.h
}
}
long {
process {
executor = 'slurm'
scratch = false
queue = 'long'
clusterOptions = '--qos=long -p long'
}
params {
max_memory = 500.GB
max_cpus = 5
max_time = 8760.h
}
}
}
You should avoid using the "long" queue as there are only 5 cores available instead dozens of 96-core servers for "normal".
250