HOPS Pipeline
HOPS is a java pipeline which focuses on screening MALT data for the presence of a user-specified list of target species. The pipeline essentially exists to make it easier to use MALT and MaltExtract together. To use HOPS you will need a config file, which specifies some key parameters for MALT and MaltExtract. You can have multiple config files to quickly redo a previous analysis or archive them to remember parameters you used in an analysis. HOPS will in every case create a log that tells you which command was sent to Slurm. In case you encounter problems, that log file is a very good place to start looking for the problem.
HOPS sample config file.#
We've set up a sample config file that is available under ${MALTEXTRACT_RESOURCES}/maestro_sample_config.txt adapted to our cluster and slurm configuration.
You should use this one as a template to run your own analysis adapting to your needs:
Code Block (bash)
module load java/1.8.0 R/4.1.0 malt hops
cp ${MALTEXTRACT_RESOURCES}/maestro_sample_config.txt my_conf.txt
vi my_conf.txt
required fields#
at least you must set a few fields:
indexwith the path to the MALT dbs you want to usepathToListwith the path to the taxonomy file to useresourceswith the path to the directory that containsncbi.mapandncbi.trefiles. thes files contains the NCBI's phylogeny and the assigned Megan ISs for each species
use slurm ?#
If you are on the cluster, you must use slurm to run malt, maltExtract and postprocessing on compute nodes via hops. In this case, hops will build the necessary sbatch commands for you.
in this case set useSlurm value
useSlurm=1→ hops will launch slurm jobs for the pipeline stepsuseSlurm=0→ hops will run all the pipeline steps localy. Do not do that on the cluster except if you are already on a compute node (salloc case).
setting hops required memory#
by default hops max allocated memory is set to 64G (see module show hops)
HOPS_JAVA_OPTS is set to -Xmx64G
you have to options to set hops max memory to use.
- overwrite
HOPS_JAVA_OPTS, see: - set
maxMemoryMaltto the required value in GB in the config file
NB hops itself is kind enough to provide an estimation of what is needed when launched with insuficiant memory settings.
see the following error message
Code Block (text)
INFO: Set Maximum Memory for Malt to default value of 650 GB
Sep 21, 2022 1:48:11 PM Utility.ParameterProcessor generateMALTCommandLine
SEVERE: HOPS has insuffcient HeapSpace (64GB) to start Malt
Please Restart HOPS as hops -Xmx650G -i ...
If your System has not enough memory please refer to the HOPS manual on the github page and the config file section
setting queue//partition to use#
You can either run on the normal qos with common partition, or on your dedicated servers, if your unit has one. The relevant variables are:
partitionPreProcessing, partitionMalt, partitionMaltEx and partitionPost
set threads to use#
HOPS config file allows you configure the number of thread sthat Malt and MaltExtract will use via
threadsMaltby default hops will run Malt with 32 threadsthreadsMaltExby default hops will run MaltExtract with 20 thread
more parameters#
You can take a look at ${MALTEXTRACT_RESOURCES}/HOPS_Config.txt for more parameters that can be set via hops config file
Example config#
Consider the following slurm section of the config file that sets 650Gb max memory and 32 threads for Malt and directs all sbatch jobs to the common partition:
Code Block (bash)
index=/opt/gensoft/tests/datas/hops/Test_Database
pathToList=/opt/gensoft/tests/datas/hops/default_list.txt
resources=/opt/gensoft/exe/hops/0.35/share
useSlurm = 1
threadsMalt=32
maxMemoryMalt=650
partitionMalt=common
wallTimeMalt=24:00:00
partitionMaltEx=common
wallTimeME=24:00:00
threadsMaltEx=1
partitionPost=common
wallTimePost=24:00:00
wallTimePreProcessing=24:00:00
partitionPreProcessing=common
Now you can just run HOPS with its arguments with a demo data set we provided:
Code Block (bash)
srun hops -c hops.cfg -i /opt/gensoft/tests/datas/hops/hops_test_reads.fastq.gz -m full -o temp