RoseTTAFold2NA

RoseTTAFold2NA (RoseTTAFold2 protein/nucleic acid complex prediction) is available on maestro cluster under module name RoseTTAFold2NA

required data, database are available on maestro throug RF2NA_DATA environment varaiable (/opt/gensoft/data/RoseTTAFold2NA/0.1/) please do not duplicate these 2.5T data. installed RoseTTAFold2NA COULD NOT use any other data source.

setting CPU//RAM requirements#

RoseTTAFold2NA is originaly built to be run using 8 CPU // 64 GB max memory for the first steps (hhblits and hhsearch). On maestro cluster we set it up to request CPU and RAM based on environment variables

SLURM_CPUS_PER_TASK for the CPUs requirement
RF2NA_MEM for the ram requirement, NB the value must be expressed in GB

SLURM_CPUS_PER_TASK is set by slurm when you run an allocation (eg srun, sbatch) default value is 1 unless you use for example the -c N or --cpus-per-task=N srun options. in this case it will be set to N and will automatically be used by RoseTTAFold2NA

RF2NA_MEM is an environment variable set by RoseTTAFold2NA modulefile. default value 64 GB. This variable may be changed to any value (in GB) to suit your needs.

knowing this 2 points RoseTTAFold2NA must be run trough an allocation that complies with your requirements

Monolithic run or not#

run_RF2NA.sh is the script to run the pipeline provided by RoseTTAFold2NA. when looking at the code one can notice that it can be decomposed in 2 parts

first: runs HHblits, PSIPRED, hhsearch, rMSA (CPUs and memory heavy)
second: runs RoseTTAFold2NA (GPU)

as only the last step of rosettafoldna can use GPU, we provide 2 additional tools run_RF2NA_part1.sh and run_RF2NA_part2.sh resulting from the split of original pipeline in 2 parts.
we strongly recommends to run this 2 part pipeline

this will allows you to optimize your analysis. remember maestro have much more CPUs than GPUs

here are examples using the example data set from RoseTTAFold2NA using the splitted script
we will require 12 CPU and 128GB
first example non-monolithic run via srun, second one via sbatch

non-monolithic run via srun#

Code Block (bash)

maestro-submit:~ > module load RoseTTAFold2NA
maestro_submit:~ > echo $RF2NA_MEM
64
maestro_submit:~ > export RF2NA_MEM=128
maestro-submit:~ > srun -c 12 --mem=${RF2NA_MEM}GB run_RF2NA_part1.sh t000_ protein.fa R:RNA.fa

then run RoseTTAFold2NA to predict structures (remenber this one requires GPU capabilities)

Code Block (bash)

maestro-submit:~ > module load RoseTTAFold2NA
maestro-submit:~ > srun -c 8 -p gpu --qos=gpu --gres=gpu:A100:1 run_RF2NA_part2.sh t000_ protein.fa R:RNA.fa

non-monolithic run via sbatch#

first steps on CPU nodes

Code Block (bash)

#!/bin/bash

#SBATCH -N 1
#SBATCH --partition=fast
#SBATCH --cpus-per-task=12.
#SBATCH --mem==${RF2NA_MEM}GB 

INPUT_1=/pasteur/appa/scratch/public/edeveaud/protein.fa
INPUT_2=R:/pasteur/appa/scratch/public/edeveaud/RNA.FA
OUTPUT_DIR=/pasteur/appa/scratch/public/edeveaud/t000_

run_RF2NAPart1.sh ${OUTPUT_DIR} ${INPUT_1} ${INPUT_2}

second steps on GPU nodes

Code Block (bash)

#!/bin/bash

#SBATCH -N 1
#SBATCH --partition=gpu
#SBATCH --cpus-per-task=8 
#SBATCH --gres=gpu:1 
#SBATCH --constraint='A100|P100:1'

#---- Job Name
#SBATCH -J RF2NA_example_p2

INPUT_1=/pasteur/appa/scratch/public/edeveaud/protein.fa
INPUT_2=R:/pasteur/appa/scratch/public/edeveaud/RNA.FA 
OUTPUT_DIR=/pasteur/appa/scratch/public/edeveaud/t000_

run_RF2NAPart2.sh ${OUTPUT_DIR} ${INPUT_1} ${INPUT_2}

et voila the predicted structure are now available in t000_/models folde