zoukankan      html  css  js  c++  java
  • 7、purge_haplogs 基因组去冗余

    1、下载安装 https://bitbucket.org/mroachawri/purge_haplotigs/wiki/Install

    1、Dependencies (in no particular order)

    bedtools

    $ sudo apt install bedtools
    $ bedtools --version
    bedtools v2.26.0
    

    samtools

    $ sudo apt install samtools
    $ samtools --version
    samtools 1.7
    Using htslib 1.7-2
    Copyright (C) 2018 Genome Research Ltd.
    

    Rscript

    $ sudo apt install r-base r-base-dev
    
    # on a new install we wont have the required R library 'ggplot2' installed
    $ sudo su - -c "R -e "install.packages('ggplot2', repos='http://cran.rstudio.com/')""
    

    Minimap2

    # download the latest release from https://github.com/lh3/minimap2/releases (currently v2.13)
    $ wget https://github.com/lh3/minimap2/releases/download/v2.13/minimap2-2.13_x64-linux.tar.bz2
    $ tar xf minimap2-2.13_x64-linux.tar.bz2
    
    # we'll add a bin directory to the home folder and add to the PATH, then install there
    $ mkdir ~/bin
    $ printf "export PATH=$PATH:~/bin
    " > .bashrc
    $ source .bashrc
    $ cp minimap2-2.13_x64-linux/minimap2 ~/bin/
    
    $ minimap2 -V
    2.13-r850
    

    MUMmer

    # download the latest release from https://github.com/mummer4/mummer/releases (currently 4.0.0.beta2)
    $ wget https://github.com/mummer4/mummer/releases/download/v4.0.0beta2/mummer-4.0.0beta2.tar.gz
    $ tar xf mummer-4.0.0beta2.tar.gz
    
    # compile
    $ cd mummer-4.0.0beta2
    $ ./configure
    $ make
    $ cd ../
    
    # install (just softlink to the home bin directory ~/bin)
    $ ln -s ~/mummer-4.0.0beta2/delta-filter ~/bin/delta-filter
    $ ln -s ~/mummer-4.0.0beta2/nucmer ~/bin/nucmer
    $ ln -s ~/mummer-4.0.0beta2/show-coords ~/bin/show-coords
    
    $ nucmer -V
    4.0.0beta2

    2、Install Purge Haplotigs

    installing to user's home directory, no compiling, just add the purge_haplotigs/bin directory to the system PATH.

    # clone the git
    $ git clone https://bitbucket.org/mroachawri/purge_haplotigs.git
    
    # create a softlink to ~/bin
    $ ln -s ~/purge_haplotigs/bin/purge_haplotigs ~/bin/purge_haplotigs
    
    # test Purge Haplotigs
    $ purge_haplotigs
    
    USAGE:
    purge_haplotigs  <command>  [options]
    
    COMMANDS:
    -- Purge Haplotigs pipeline:
        readhist        First step, generate a read-depth histogram for the genome
        contigcov       Second step, get contig coverage stats and flag 'suspect' contigs
        purge           Third step, identify and reassign haplotigs
    
    -- Other scripts
        ncbiplace       Generate a placement file for submission to NCBI
        test            Test everything!
    
    
    # test the pipeline
    $ purge_haplotigs test
        # <lots of jargon>
    ALL TESTS PASSED

    3、Running Purge Haplotigs(https://www.jianshu.com/p/8ed5b494b131

    PREPARATION

    minimap2 -t 4 -ax map-pb genome.fa subreads.fasta.gz --secondary=no 
        | samtools sort -@ 8 -m 1G -o aligned.bam -T tmp.ali

    STEP 1

    Generate a coverage histogram by running the first script. This script will produce a histogram png image file for you to look at and a BEDTools 'genomecov' output file that you'll need for STEP 2.

    purge_haplotigs  hist  -b aligned.bam  -g genome.fasta  [ -t threads ]

    STEP 2

    Run the second script using the cutoffs from the previous step to analyse the coverage on a contig by contig basis. This script produces a contig coverage stats csv file with suspect contigs flagged for further analysis or removal.

    purge_haplotigs  cov  -i aligned.bam.genecov  -l <integer>  -m <integer>  -h <integer>  
                [-o coverage_stats.csv -j 80  -s 80 ]

    STEP 3

    Run the purging pipeline. This script will automatically run a BEDTools windowed coverage analysis (if generating dotplots), and minimap2 alignments to assess which contigs to reassign and which to keep. The pipeline will make several iterations of purging. Optionally, parse repeats -r in BED format for improved handling of repetitive regions

    purge_haplotigs  purge  -g genome.fasta  -c coverage_stats.csv

    You will have five files

    • <prefix>.fasta: These are the curated primary contigs
    • <prefix>.haplotigs.fasta: These are all the haplotigs identified in the initial input assembly.
    • <prefix>.artefacts.fasta: These are the very low/high coverage contigs (identified in STEP 2). NOTE: you'll probably have mitochondrial/chloroplast/etc. contigs in here with the assembly junk.
    • <prefix>.reassignments.tsv: These are all the reassignments that were made, as well as the suspect contigs that weren't reassigned.
    • <prefix>.contig_associations.log: This shows the contig "associations" e.g
     
    
    

       

  • 相关阅读:
    android开发我的新浪微博客户端-用户授权页面UI篇(3.1)
    android开发我的新浪微博客户端-OAuth篇(2.1)
    android开发我的新浪微博客户端-载入页面sqlite篇(1.2)
    android开发我的新浪微博客户端-载入页面UI篇(1.1)
    android 强制设置横屏 判断是横屏还是竖屏
    android 各种进度条(ProgressBar)
    android:百度地图-给地图添加标注物
    android应用与服务的通信之学生查询系统案例源码
    android手机多线程断点续传下载器案例源码
    android外拨电话拦截器,完整源码
  • 原文地址:https://www.cnblogs.com/renping/p/11310702.html
Copyright © 2011-2022 走看看