Skip to content

fasterq-dump

Problem#

If you try to run directly fasterq-dump  on a node, you will have an error message of the kind

Error message from fasterq-dump (shell)

$ fasterq-dump XXXXX
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: invalid accession 'XXXXX'

This is due to the fact that sra-tools don't have support of authenticated proxy (cf.the issue on github) while it is what we use to moderate the internet access from the nodes of the cluster (see details on this page).

Solution#

To work around that problem, you have to separate:

  1. the download of the data that you can do with prefetch  or prefetch-orig command on the submit node only (if not, you will have the same message as above),
  2. from the analysis itself that must be done with fasterq-dump or fasterq-dump-orig command on the compute nodes.

Don't download the data in your home. For that, when using  prefetch on maestro.pasteur.fr, please don't forget to use -Ooption and give the path of a directory:

  • either under one of your entity project spaces on Helix or Zeus,
  • or under /pasteur/appa/scratch.

Note that you will still have the

Downloading with prefetch from the submit node (shell)

[login@maestro-submit ~]$ prefetch-orig XXXXX
prefetch-orig.3.2.0 err: libs/kfs/unix/sysdir.c:2305:KSysDirOpenDirRead_v1:  no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
prefetch-orig.3.2.0 err: libs/kns/proxy.c:558:KNSProxiesVSetHTTPProxyPath:  no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
prefetch-orig.3.2.0 err: libs/kns/proxy.c:558:KNSProxiesVSetHTTPProxyPath:  no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
prefetch-orig.3.2.0 err: libs/kns/proxy.c:558:KNSProxiesVSetHTTPProxyPath:  no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored

proxy related messages but, this time, it will nonetheless download the data

Downloading with prefetch from the submit node (shell)

prefetch-orig.3.2.0: 1) Resolving 'XXXXX'...
prefetch-orig.3.2.0: Current preference is set to retrieve SRA Normalized Format files with full base quality scores                                                                        
prefetch-orig.3.2.0: 1) Downloading 'XXXXX'...
prefetch-orig.3.2.0:  SRA Normalized Format file is being retrieved
prefetch-orig.3.2.0:  Downloading via HTTPS...

since the proxy can be bypassed from the submit node (see details on this page).

As explained on the sra-tools dedicated page, launch vdb-config -i and uncheck Enable Remote Access to be sure that fasterq-dump won't try to download the data again:

  1. launch the vdb-config -i command,
  2. type M to reach the MAIN  tab,
  3. then type E to uncheck Enable Remote Access if necessary,
  4. then type s and then strike Enter to save your change,
  5. then type x to leave the interactive window.

Once done, you can:

  • either launch salloc and inside that allocation:
  • go to the data directory,
  • launch fasterq-dump or fasterq-dump-origon these data,
  • or submit an sbatch script in which
  • you first use cd to go to the directory where the data are located,
  • and then launch fasterq-dump or fasterq-dump-origon them.

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

false5FAQAfalsemodifiedtruepagelabel in ("proxy","sra-tools","internet") and type = "page" and space = "FAQA"sra-tools internet proxy

true

Related issues