zoukankan      html  css  js  c++  java
  • SRA toolkit

    使用SRAdb V2获取SRA数据

    安装SRAdbV2包
    install.packages('BiocManager')
    BiocManager::install('seandavi/SRAdbV2')

    使用SRAdbV2 首先需要创建一个 R6类-Omicidx

    library(SRAdbV2)
    oidx = Omicidx$new()

    创建好Omicidx实例后,就可以使用oidx$search()来进行数据检索
    query=paste(
      paste0('sample_taxon_id:', 10116),
      'AND experiment_library_strategy:"rna seq"',
      'AND experiment_library_source:transcriptomic',
      'AND experiment_platform:illumina')
    z = oidx$search(q=query,entity='full',size=100L)

    其中,entity 参数是指可以通过API获得的SRA实体类型, size 参数指查询结果返回的记录数

    由于有时候返回的结果集数据量很会大,所以我们可以使用 Scroller 来对结果进行检索提炼
    s = z$scroll()
    s
    s$count

    s$count 可以让我们简单看一下返回数据的条数有多少

    Error in curl::curl_fetch_memory(url, handle = handle) :
      Could not resolve host: api-omicidx.cancerdatasci.org

    1.1 Scroller提供两种方法来存取数据
    第一种方法,是把所有的查询结果都加载到R的内存中,但是这会很慢
    res = s$collate(limit = 1000)
    head(res)
    然后使用 reset() 重新设置Scroller

    s$reset()
    s

    第二种方法是,使用 yield 方法来迭代取数据
    j = 0
    ## fetch only 500 records, but
    ## `yield` will return NULL
    ## after ALL records have been fetched
    while(s$fetched < 500) {
        res = s$yield()
        # do something interesting with `res` here if you like
        j = j + 1
        message(sprintf('total of %d fetched records, loop iteration # %d', s$fetched, j))
    }

    如果没有获取到完整的数据集,Scroller对象的has_next()方法会报出 TRUE
    使用 reset() 函数可以将光标移动到数据集的开头

    2. Query syntax
    见这里
    https://bioconductor.github.io/BiocWorkshops/public-data-resources-and-bioconductor.html#query-syntax

    3. Using the raw API without R/Bioconductor
    可以不通过R/Bioconductor,而是用原生API获取数据
    SRAdbV2封装了web的API,因此可以通过web API访问其中数据

    sra_browse_API()

    基于web的API为实验数据查询提供了一个有用的接口,基于json的可以用

    sra_get_swagger_json_url()

     ===========================================

    安装 sra toolkit

     https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

    wget --output-document sratoolkit.tar.gz http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz

    tar -vxzf sratoolkit.current-centos_linux64.tar.gz 

    export PATH=$PATH:$PWD/sratoolkit.2.10.9-centos_linux64/bin

     ===========================================

    https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

    Tool: prefetch

    Usage:
    prefetch [options] <path/SRA file | path/kart file> [<path/file> ...]
    prefetch [options] <SRA accession>
    prefetch [options] --list <kart_file>
    Frequently Used Options:
    General:
    -h | --help Displays ALL options, general usage, and version information.
    -V | --version Display the version of the program.
    Data transfer:
    -f | --force <value> Force object download. One of: no, yes, all. no [default]: Skip download if the object if found and complete; yes: Download it even if it is found and is complete; all: Ignore lock files (stale locks or if it is currently being downloaded: use at your own risk!).
        --transport <value> Value one of: ascp (only), http (only), both (first try ascp, fallback to http). Default: both.
    -l | --list List the contents of a kart file.
    -s | --list-sizes List the content of kart file with target file sizes.
    -N | --min-size <size> Minimum file size to download in KB (inclusive).
    -X | --max-size <size> Maximum file size to download in KB (exclusive). Default: 20G.
    -o | --order <value> Kart prefetch order. One of: kart (in kart order), size (by file size: smallest first). default: size.
    -a | --ascp-path <ascp-binary|private-key-file> Path to ascp program and private key file (asperaweb_id_dsa.openssh).
    -p | --progress <value> Time period in minutes to display download progress (0: no progress). Default: 1.
        --option-file <file> Read more options and parameters from the file.
    Use examples:
    prefetch cart_0.krt
    Download the files listed in the kart file.
     
    prefetch -l cart_0.krt
    Lists the contents of the kart file.
     
    prefetch -X 200G cart_0.krt
    Sets the maximum download file size to 200GB and downloads the files listed in the kart.
     
    prefetch -o kart cart_0.krt
    Downloads the contents in the order listed in the kart. Preferred for large run sets (example: 100+) where calculating the download sizes may cause a delay to the start of downloads.
     
    prefetch -a "/opt/aspera/bin/ascp|/opt/aspera/etc/asperaweb_id_dsa.openssh" SRR390728
    When the toolkit is unable to locate an installed version of Aspera, the location of ascp and ssh key (-a /opt/aspera/bin/ascp|/opt/aspera/bin/asperaweb_id_dsa.openssh") can be provided.
     
    prefetch -t ascp -a "/opt/aspera/bin/ascp|/opt/aspera/bin/asperaweb_id_dsa.openssh" --option-file file.txt
    Will force download to be only through aspera (-t ascp) and will prevent http download, default operation is to attempt ascp first and use http if Aspera is not found or fails. Will sequentially download the SRA data files and references required for a list of accessions in "file.txt". The format for "file.txt" is a newline-separated list of accessions: SRR# SRR# SRR# …
     
    prefetch ~/Downloads/SRR390728.sra
    If you have already downloaded an SRA datafile (example here: SRR390728.sra, present in the "~/Downloads" directory), this command will retrieve all of the reference sequences required to extract the data. This command is useful for resolving errors of the type "name not found while resolving tree" - meaning that a reference(s) is required, but cannot be located.
     
    prefetch -c SRR390728
    This command will check the availability of all needed reference sequences (-c) for a given accession.

    ===========================================

    A non-R solution is to use the SRA toolkit prefetch command on a list of SRA identifiers.

    First you need the file list. You can batch download it. In your case, go to https://www.ncbi.nlm.nih.gov/sra?term=SRP026197 Top-right, click to "Send To", "File", "Accession List".

    Once you have it saved in a file (default is SraAccList.txt) you can use the command (tested in SRA toolkit 2.9.0):

    prefetch $(<SraAccList.txt)

    ===========================================

    prefetch 无法显示进度和速度;

    wget 显示进度和速度;

    迅雷  显示进度和速度;

    ===========================================

    REF

    https://www.biostars.org/p/93494/

    https://blog.csdn.net/candle_light/article/details/92806204

    https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch

    https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit

    https://bioconductor.github.io/BiocWorkshops/public-data-resources-and-bioconductor.html#usage-1

  • 相关阅读:
    libusb 示例
    里不是吧、
    ibeacon UUID
    Centos7系统下Docker开启认证的远程端口2376配置教程
    Consul 快速入门
    docker: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled
    Docker 启动容器时,报错 WARNING:IPv4 forwarding is disabled. Networking will not work. 的解决办法
    【基线检查】(高)基线检查--禁用local-infile选项(访问控制)
    PyCharm 上安装 Package(以 pandas 为例)
    Python time模块和datetime模块
  • 原文地址:https://www.cnblogs.com/emanlee/p/14502070.html
Copyright © 2011-2022 走看看