Skip to content

RoseTTAFold

!!!!! This page is a work in progress. it will evolve in the next few days. !!!!!

!!!!! RoseTTAFold is currently available for testing purpose !!!!!

!!!!! ONLY monomer prediction was currently tested !!!!!

!!!!! complex structure prediction may (or not) works...... !!!!!

Alphafold is available on maestro cluster under module name RoseTTAFold/1.0.0
we provide the reference data downladed accordingly with the RoseTTAFold Readme (point 5: Download sequence and structure databases)

environnement description#

Once RoseTTAFold/1.0.0 is loaded you can access the RoseTTAFold data through ROSETTAFOLD_DATA environment variable.

As always see module show RoseTTAFold/1.0.0 command output to display environment changes made by loading the module.

Code Block (bash)

maestro-submit:~ > module show test/RoseTTAFold/1.0.0 
-------------------------------------------------------------------
/opt/gensoft/modules/test/RoseTTAFold/1.0.0:

module-whatis   {Set environnement for RoseTTAFold (1.0.0)}
prepend-path    PATH /opt/gensoft/exe/RoseTTAFold/1.0.0/bin
setenv          ROSETTAFOLD_DATA /opt/gensoft/data/RoseTTAFold/1.0.0
setenv          HHTOOLS_MEMORY 64
-------------------------------------------------------------------
  • ROSETTAFOLD_DATA: location of the ROSETTAFOLD_DATA data
  • HHTOOLS_MEMORY: requested memory in GB for the hhsuite tools used by RoseTTAFold (hhblits, hhsearch)

You can change HHTOOLS_MEMORY environment variable value to suit your needs BUT keep in mind that on Maestro it should complies with the memory requested on your slurm allocation via the --mem slurm option (run_e2e_ver.sh.sh, run_pyrosetta_ver.sh and corresponding cpu version uses 64GB as default value).

On maestro we provide RoseTTAFold useable in GPU or CPU mode. The choice is made by the name of the tool. For example '*_cpu.sh' tools will run in CPU-only mode.

In order to use the standard GPU mode you must request an allocation for the GPU capable queue//nodes. See bellow

GPU mode CPU mode description
run_e2e_ver.sh run_e2e_ver_cpu.sh monomer structure prediction with a single PDB output having estimated residue-wise CA-lddt at the B-factor column
run_pyrosetta_ver.sh run_pyrosetta_ver_cpu.sh monomer structure prediction with five final models having estimated CA rms error at the B-factor column

provided tools description#

in order to know the tools provided by RoseTTAFold/1.0.0 module use the module help command.

Code Block (text)

maestro-submit:~ > module help test/RoseTTAFold/1.0.0
-------------------------------------------------------------------
Module Specific Help for /opt/gensoft/modules/test/RoseTTAFold/1.0.0:

This modulefile defines the requisite environement
needed to use package: RoseTTAFold version (1.0.0)

official implementation of RoseTTAFold - Accurate prediction of protein structures and interactions using a 3-track network.
URL: https://github.com/RosettaCommons/RoseTTAFold


package provides following commands:
        hhfilter
        RosettaTR.py
        make_joint_MSA_bacterial.py
        make_msa.sh
        make_ss.sh
        predict_complex.py
        predict_e2e.py
        predict_pyRosetta.py
        run_e2e_ver.sh
        run_e2e_ver_cpu.sh
        run_pyrosetta_ver.sh
        run_pyrosetta_ver_cpu.sh

-------------------------------------------------------------------

we provide both tools of RoseTTAFold for convinience plus 2 inhouse tools. the main tools to run are:

  • run_pyrosetta_ver.sh and run_pyrosetta_ver_cpu.sh: monomer structure prediction with five final models having estimated CA rms error at the B-factor column (gpu and cpu version respectively)
  • run_e2e_ver.sh and run_e2e_ver_cpu.sh: monomer structure prediction with a single PDB output having estimated residue-wise CA-lddt at the B-factor column gpu and cpu version respectively)
  • make_joint_MSA_bacterial.py: Make paired alignments for complex pretein predictions (see bellow)
  • hhfilter: Filter the paired alignment using hhfilter used for complex pretein predictions (see bellow)
  • predict_complex.py:  Run complex structure prediction

NB our installation is run through a singularity container.
by default the container will bind mount the following paths:

  • /pasteur: giving access to the pasteur tree, eg: projects, scratch, homes etc etc
  • $HOME
  • /local/databases: giving access to data banks
  • /local/scratch: is bind mounted to /tmp in the container to respect the temporary files location policy used on maestro
  • ${ROSETTAFOLD_DATA}: of course alphafold data are bind mounted on the container

NB: you can bind mount other volumes on the container using the SINGULARITY_BINDPATH environment variable.

IMPORTANT NOTES

  • RoseTTAFold uses only 1 GPUs. So be sure to ask for only 1 (ONE) gpu in your job allocations
  • remember to request memory accordingly to HHTOOLS_MEMORY velue, in GB else you may be killed by slurm as default allocation grants 40Gb/CPU.
  • RoseTTAFold runs some threaded jobs (hhblits, hhsearch) or run jobs in parallel (8 threads // parallel jobs by default). On Maestro this number of threads // jobs is automatically set to SLURM_CPUS_PER_TASK (aka the number of CPUs requested in your allocation through the -c//--cpus-per-task slurm option)
  • RoseTTAFold MUST be run trough a slurm allocation sbatch//salloc or srun.

Running monomer RoseTTAFoldprediction via srun#

for the example we will run RoseTTAFold monomer prediction providing 5 models. One can choose to run RoseTTAFold monomer prediction providing 1 model, replacing run_pyrosetta_ver.sh with run_e2e_ver.sh in the follwoing command lines

  • running on GPU mode:#

Code Block (bash)

maestro-submit:~ > module load test/RoseTTAFold
maestro-submit:~ > srun -p gpu --qos=gpu --gres=gpu:A100:1 --cpus-per-task=8 --mem=${HHTOOLS_MEMORY}G run_pyrosetta_ver.sh input.fa output_dir
  • running on CPU mode#

Code Block (bash)

maestro-submit:~ > module load test/RoseTTAFold
maestro-submit:~ > srun --cpus-per-task=8 --mem=${HHTOOLS_MEMORY}G run_pyrosetta_ver.sh input.fa output_dir

Running monomer RoseTTAFold prediction via sbatch#

  • running on GPU mode:#

Code Block (bash)

#!/bin/bash

#SBATCH --partition=gpu  --qos=gpu
#SBATCH --gres=gpu:1               # remember 1 GPU
#SBATCH --constraint='A100:1'      # target gpu card choose between A100 or V100
#SBATCH --cpus-per-task=8          # hhblitss, hhsearch default requirement
#SBATCH --mem=${HHTOOLS_MEMORY}G   # required memory in GB         

#---- Job Name
#SBATCH -J 1STU_rf_job

INPUT_FASTA=/pasteur/appa/scratch/public/edeveaud/1STU.fasta
OUTPUT_DIR=/pasteur/appa/scratch/public/edeveaud/1STU_out
PREDICTION=run_pyrosetta_ver.sh # kind of prediction to run chose beetween run_pyrosetta_ver.sh or run_e2e_ver.sh

#---- do the job
${PREDICTION} ${INPUT_FASTA} ${OUTPUT_DIR}
  • running on CPU mode:#

Code Block (bash)

#!/bin/bash

#SBATCH --cpus-per-task=8         # hhblitss, hhsearch default requirement
#SBATCH --mem=${HHTOOLS_MEMORY}G  # required memory in GB

#---- Job Name
#SBATCH -J 1STU_rf_job

INPUT_FASTA=/pasteur/appa/scratch/public/edeveaud/1STU.fasta
OUTPUT_DIR=/pasteur/appa/scratch/public/edeveaud/1STU_out
PREDICTION=run_pyrosetta_ver_cpu.sh # kind of prediction to run chose beetween run_pyrosetta_ver_cpu.sh or run_e2e_ver_cpu.sh

#---- do the job
${PREDICTION} ${INPUT_FASTA} ${OUTPUT_DIR}

complex_preparation

running complex protein RoseTTAFold prediction.#

How to prepare inputs for complex modeling

  1. Generate multiple sequence alignments for each subunit
  2. Make paired alignments
  3. It is important to generate "good" paired sequence alignments to extract coevolution signal properly.
  4. For bacterial proteins, you may use make_joint_MSA_bacterial.py script to generate paired alignments. It pairs the sequences having similar UniProt accession code.
  5. For eukaryotes, there's no easy way to generate paired alignments. Good luck!
  6. NB runs ins CPU mode only, so no need to request a GPU node
  7. Filter the paired alignment using hhfilter (use hmodule load hsuite/3.3.0 module to access hhfilter command)
  8. hhfilter -i paired.a3m -o filtered.a3m -id 90 (or 95) -cov 75 (or 50)
  9. NB runs ins CPU mode only, so no need to request a GPU node
  10. If you want to incorporate complex template information, please make npz file that contains input template features for RoseTTAFold. It should contain the following keys ("xyz_t", "t1d", "t0d").
  11. xyz_t: N, CA, C coordinates of complex templates (# of templ, # of residue, 3 (for N, Ca, C), 3 (xyz coord)) For the unaligned region, it should be NaN.
  12. t1d: 1-D features from HHsearch results (score, SS, probab column from atab file) (T, L, 3). For the unaligned region, it should be zeros
  13. t0d: 0-D features from HHsearch (Probability/100.0, Ideintities/100.0, Similarity fro hhr file) (T, 3)
  14. Run complex structure prediction
  15. predict_complex.py -i filtered.a3m -o complex -Ls 218 310
      • Set the numeric parameters after -Ls argument to the lengths of each individual subunit, in the order that they were paired
    • n this example, the aa length of subunit1 was 218, subunit2 was 310.
    • NB this script can be run in CPU mode only using the --cpu command line option
  16. You may want to run Rosetta fastrelax w/ coordinate restraints to add sidechains.

some numerical values#

here are the timings of the different jobs.

run on Tsp1.fa#
run_pyrosetta GPU mode run_pyrosetta CPU mode run_e2e GPU mode run_e2e CPU mode
HHTOOLS_MEMORY=64 SLURM_CPUS_PER_TASK=8 real    80m40.917s user    322m47.901s sys    1m57.070s real    78m35.863s user    355m1.903s sys     13m13.849s real    28m22.496s user    161m23.120s sys    1m9.866s real    34m11.120s user    509m5.386s sys     72m14.860s
HHTOOLS_MEMORY=64 SLURM_CPUS_PER_TASK=95 real    80m10.435s user    1760m30.599s sys    8m33.578s real    77m53.783s user    1662m48.939s sys     18m15.265s real    36m51.639s user    1608m41.954s sys    7m18.744s real    39m33.383s user    1920m18.416s sys     68m7.570s
run on S-layer.fa#
run_pyrosetta GPU mode run_pyrosetta CPU mode run_e2e GPU mode run_e2e CPU mode
HHTOOLS_MEMORY=64 SLURM_CPUS_PER_TASK=8 real    856m20.154s user    4942m42.875s sys    15m32.150s real    1164m27.392s user    25830m2.842s sys     5418m47.984s CUDA out of memory Tried to allocate 8.89 GiB (GPU 0; 19.62 GiB total capacity; 10.48 GiB already allocated; 5.66 GiB free; 12.53 GiB reserved in total by PyTorch) to run
HHTOOLS_MEMORY=64 SLURM_CPUS_PER_TASK=95 real    514m36.122s user    6138m27.414s sys    25m24.294s to run CUDA out of memory Tried to allocate 8.89 GiB (GPU 0; 19.62 GiB total capacity; 10.48 GiB already allocated; 5.67 GiB free; 12.53 GiB reserved in total by PyTorch) to run