zoukankan      html  css  js  c++  java
  • 6、RNA-Seq Analysis Pipeline

    Created by Dhivya Arasappan, last modified by Dennis C Wylie on Nov 08, 2015

    This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 10 hour minimum ($470 internal, $600 external) per project.

    1. Quality Assessment

    Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

    • Deliverables:
      • reports generated by FastQC
    • Tools used:
      • FastQC: (Andrews 2010) used to generate quality summaries of data:
        • Per base sequence quality report: useful for deciding if trimming necessary.
        • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
        • Overrepresented sequences: evaluation of adapter contamination.

    2. Fastq Preprocessing

    Quality assessment used to decide if any preprocessing of the raw data is required and if so, preprocessing is performed.

    • Deliverables
      • Trimmed/filtered fastq files.
    • Tools Used:
      • Fastx-toolkit: Used to preprocess fastq files.
        • Fastq quality trimmer: Trimming reads based on quality.
        • Fastq quality filter: Filtering reads based on quality.
      • Cutadapt: Used to remove adaptor from reads.
     

    3. Mapping

    Mapping to genome reference performed using BWA-mem or Tophat.

    • Deliverables
      • Mapping results, as bam files and mapping statistics.
    • Tools Used:
      • BWA-mem: (Li 2013) primary aligner used to generate read alignments.
      • Tophat: (Kim 2011) aligner used to generate read alignments in a splice-aware manner and identify novel junctions.
      • Samtools: (Li 2009) used to generate mapping statistics.

    4. Gene/Transcript Counting

    Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.

    • Deliverables
      • Raw gene/transcript counts
    • Tools Used:
      • HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.

    5. DEG Identification

    Normalization and statistical testing to identify differentially expressed genes.

    • Deliverables
      • DEG Summary and master file containing fold changes and p values for every gene, MA Plots.
    • Tools Used:
      • DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.
  • 相关阅读:
    轻松自动化---selenium-webdriver(python) (八)
    轻松自动化---selenium-webdriver(python) (七)
    轻松自动化---selenium-webdriver(python) (六)
    轻松自动化---selenium-webdriver(python) (五)
    轻松自动化---selenium-webdriver(python) (四)
    轻松自动化---selenium-webdriver(python) (三)
    轻松自动化---selenium-webdriver(python) (二)
    轻松自动化---selenium-webdriver(python) (一)
    容器在 Weave 中如何通信和隔离?- 每天5分钟玩转 Docker 容器技术(65)
    Weave 网络结构分析
  • 原文地址:https://www.cnblogs.com/renping/p/7045333.html
Copyright © 2011-2022 走看看