RoseTTAFold2NA
RoseTTAFold2NA (RoseTTAFold2 protein/nucleic acid complex prediction) is available on maestro cluster under module name RoseTTAFold2NA
required data, database are available on maestro throug RF2NA_DATA environment varaiable (/opt/gensoft/data/RoseTTAFold2NA/0.1/) please do not duplicate these 2.5T data. installed RoseTTAFold2NA COULD NOT use any other data source.
setting CPU//RAM requirements#
RoseTTAFold2NA is originaly built to be run using 8 CPU // 64 GB max memory for the first steps (hhblits and hhsearch). On maestro cluster we set it up to request CPU and RAM based on environment variables
SLURM_CPUS_PER_TASKfor the CPUs requirementRF2NA_MEMfor the ram requirement, NB the value must be expressed in GB
SLURM_CPUS_PER_TASK is set by slurm when you run an allocation (eg srun, sbatch) default value is 1 unless you use for example the -c N or --cpus-per-task=N srun options. in this case it will be set to N and will automatically be used by RoseTTAFold2NA
RF2NA_MEM is an environment variable set by RoseTTAFold2NA modulefile. default value 64 GB. This variable may be changed to any value (in GB) to suit your needs.
knowing this 2 points RoseTTAFold2NA must be run trough an allocation that complies with your requirements
Monolithic run or not#
run_RF2NA.sh is the script to run the pipeline provided by RoseTTAFold2NA. when looking at the code one can notice that it can be decomposed in 2 parts
- first: runs HHblits, PSIPRED, hhsearch, rMSA (CPUs and memory heavy)
- second: runs RoseTTAFold2NA (GPU)
as only the last step of rosettafoldna can use GPU, we provide 2 additional tools run_RF2NA_part1.sh and run_RF2NA_part2.sh resulting from the split of original pipeline in 2 parts.
we strongly recommends to run this 2 part pipeline
this will allows you to optimize your analysis. remember maestro have much more CPUs than GPUs
here are examples using the example data set from RoseTTAFold2NA using the splitted script
we will require 12 CPU and 128GB
first example non-monolithic run via srun, second one via sbatch
non-monolithic run via srun#
Code Block (bash)
maestro-submit:~ > module load RoseTTAFold2NA
maestro_submit:~ > echo $RF2NA_MEM
64
maestro_submit:~ > export RF2NA_MEM=128
maestro-submit:~ > srun -c 12 --mem=${RF2NA_MEM}GB run_RF2NA_part1.sh t000_ protein.fa R:RNA.fa
then run RoseTTAFold2NA to predict structures (remenber this one requires GPU capabilities)
Code Block (bash)
maestro-submit:~ > module load RoseTTAFold2NA
maestro-submit:~ > srun -c 8 -p gpu --qos=gpu --gres=gpu:A100:1 run_RF2NA_part2.sh t000_ protein.fa R:RNA.fa
non-monolithic run via sbatch#
first steps on CPU nodes
Code Block (bash)
#!/bin/bash
#SBATCH -N 1
#SBATCH --partition=fast
#SBATCH --cpus-per-task=12.
#SBATCH --mem==${RF2NA_MEM}GB
INPUT_1=/pasteur/appa/scratch/public/edeveaud/protein.fa
INPUT_2=R:/pasteur/appa/scratch/public/edeveaud/RNA.FA
OUTPUT_DIR=/pasteur/appa/scratch/public/edeveaud/t000_
run_RF2NAPart1.sh ${OUTPUT_DIR} ${INPUT_1} ${INPUT_2}
second steps on GPU nodes
Code Block (bash)
#!/bin/bash
#SBATCH -N 1
#SBATCH --partition=gpu
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:1
#SBATCH --constraint='A100|P100:1'
#---- Job Name
#SBATCH -J RF2NA_example_p2
INPUT_1=/pasteur/appa/scratch/public/edeveaud/protein.fa
INPUT_2=R:/pasteur/appa/scratch/public/edeveaud/RNA.FA
OUTPUT_DIR=/pasteur/appa/scratch/public/edeveaud/t000_
run_RF2NAPart2.sh ${OUTPUT_DIR} ${INPUT_1} ${INPUT_2}
et voila the predicted structure are now available in t000_/models folde