zoukankan      html  css  js  c++  java
  • 8、Transcriptome Assembly

    Created by Benjamin M Goetz, last modified on Jun 29, 2015

    Assembly of RNA-seq short reads into a transcriptome. 

    1. Quality Assessment

    Quality of data assessed by FastQC.

    • Deliverables
      • Reports generated by FastQC.
    • Tools Used
      • FastQC: (Andrews 2010) used to generate quality summaries of data:
        • Per base sequence quality report: useful for deciding if trimming necessary.
        • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
        • Overrepresented sequences: evaluation of adapter contamination.

    2. Assembly

    We use Trinity to generate a de novo assembly. Assembly is a very computationally complex task, and may not finish within the time limits imposed on compute jobs at TACC, especially for large data sets. To increase the chance of getting an assembly, we run two assemblies: one with the original data, and one with an in silico normalization to 50x coverage before the main assembly starts. If the non-normalized data doesn't complete an assembly, the normalized data may.

    • Deliverables
      • FASTA file of assembly from full data (if it finishes).

      • FASTA file of assembly with in silico normalization to 50x coverage (if it finishes).

      • If neither assembly run finishes, no charge.

    • Tools Used
      • Trinity (Grabherr, et al 2011) is the best-known and most-used transcriptome assembler available today.

    3. Optional: Homology Against Standard Databases

    We can take a completed assembly and BLAST against UniProt or HMMER against Pfam for an additional charge. These homology searches will give some indication of what the assembled transcripts represent.

    • Deliverables
      • BLAST against UniProt table with the option of appending the best hits to the FASTA file tags.

      • HMMER against Pfam table with the option of appending the best hits to the FASTA file tags.

    • Tools Used
      • BLASTx (Altschul, et al 1997) for nucleotide-to-protein homology search in the UniProt protein database.
      • hmmscan (Eddy, 1998) for HMM-based homology search against the Pfam database of proteins and protein domains.
     
  • 相关阅读:
    团队开发冲刺第二阶段_1
    团队开发冲刺第一阶段_7
    mysql 官方集群
    Tomcat提高并发
    Percona XtraDB Cluster 5.7
    Mysql常用配置及优化
    Linux 常用命令
    数据库主从复制
    Linux 环境下Web环境搭建————ActiveMQ
    Linux 下Web环境搭建————redis
  • 原文地址:https://www.cnblogs.com/renping/p/7045353.html
Copyright © 2011-2022 走看看