fasterq-dump
Problem#
If you try to run directly fasterq-dump on a node, you will have an error message of the kind
Error message from fasterq-dump (shell)
$ fasterq-dump XXXXX
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: no error - Proxy '<your login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
fasterq-dump.2.10.2 err: invalid accession 'XXXXX'
This is due to the fact that sra-tools don't have support of authenticated proxy (cf.the issue on github) while it is what we use to moderate the internet access from the nodes of the cluster (see details on this page).
Solution#
To work around that problem, you have to separate:
- the download of the data that you can do with
prefetchorprefetch-origcommand on the submit node only (if not, you will have the same message as above), - from the analysis itself that must be done with
fasterq-dumporfasterq-dump-origcommand on the compute nodes.
Don't download the data in your home. For that, when using prefetch on maestro.pasteur.fr, please don't forget to use -Ooption and give the path of a directory:
- either under one of your entity project spaces on Helix or Zeus,
- or under
/pasteur/appa/scratch.
Note that you will still have the
Downloading with prefetch from the submit node (shell)
[login@maestro-submit ~]$ prefetch-orig XXXXX
prefetch-orig.3.2.0 err: libs/kfs/unix/sysdir.c:2305:KSysDirOpenDirRead_v1: no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
prefetch-orig.3.2.0 err: libs/kns/proxy.c:558:KNSProxiesVSetHTTPProxyPath: no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
prefetch-orig.3.2.0 err: libs/kns/proxy.c:558:KNSProxiesVSetHTTPProxyPath: no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
prefetch-orig.3.2.0 err: libs/kns/proxy.c:558:KNSProxiesVSetHTTPProxyPath: no error - Proxy '<login>:<string>=@maestro-squid.maestro.pasteur.fr:3128' was ignored
proxy related messages but, this time, it will nonetheless download the data
Downloading with prefetch from the submit node (shell)
prefetch-orig.3.2.0: 1) Resolving 'XXXXX'...
prefetch-orig.3.2.0: Current preference is set to retrieve SRA Normalized Format files with full base quality scores
prefetch-orig.3.2.0: 1) Downloading 'XXXXX'...
prefetch-orig.3.2.0: SRA Normalized Format file is being retrieved
prefetch-orig.3.2.0: Downloading via HTTPS...
since the proxy can be bypassed from the submit node (see details on this page).
As explained on the sra-tools dedicated page, launch vdb-config -i and uncheck Enable Remote Access to be sure that fasterq-dump won't try to download the data again:
- launch the
vdb-config -icommand, - type
Mto reach theMAINtab, - then type
Eto uncheckEnable Remote Accessif necessary, - then type
sand then strikeEnterto save your change, - then type
xto leave the interactive window.
Once done, you can:
- either launch
sallocand inside that allocation: - go to the data directory,
- launch
fasterq-dumporfasterq-dump-origon these data, - or submit an
sbatchscript in which - you first use
cdto go to the directory where the data are located, - and then launch
fasterq-dumporfasterq-dump-origon them.
Related articles#
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
false5FAQAfalsemodifiedtruepagelabel in ("proxy","sra-tools","internet") and type = "page" and space = "FAQA"sra-tools internet proxy
true
| Related issues |