zoukankan      html  css  js  c++  java
  • Creating the fasta sequence dictionary file, the fasta index file and the vcf index file including "boa index a.fasta"

    Why these steps are necessary

    The GATK uses two files to access and safety check access to the reference files: a .dict dictionary of the contig names and sizes and a .fai fasta index file to allow efficient random access to the reference bases. You have to generate these files in order to be able to use a Fasta file as reference.

    NOTE: Picard and samtools treat spaces in contig names differently. We recommend that you avoid using spaces in contig names.

    Creating the fasta sequence dictionary file

    We use CreateSequenceDictionary.jar from Picard to create a .dict file from a fasta file.

    1 > java -jar CreateSequenceDictionary.jar R= Homo_sapiens_assembly18.fasta O= Homo_sapiens_assembly18.dict
    2 
    3 [Fri Jun 19 14:09:11 EDT 2009] net.sf.picard.sam.CreateSequenceDictionary R= Homo_sapiens_assembly18.fasta O= Homo_sapiens_assembly18.dict
    4 [Fri Jun 19 14:09:58 EDT 2009] net.sf.picard.sam.CreateSequenceDictionary done.
    5 Runtime.totalMemory()=2112487424
    6 44.922u 2.308s 0:47.09 100.2%   0+0k 0+0io 2pf+0w

    Creating the fasta index file

    We use the faidx command in samtools to prepare the fasta index file. This file describes byte offsets in the fasta file for each contig, allowing us to compute exactly where a particular reference base at contig:pos is in the fasta file.

    1 > samtools faidx Homo_sapiens_assembly18.fasta 
    2 108.446u 3.384s 2:44.61 67.9%   0+0k 0+0io 0pf+0w

     Creating and sorting the vcf index file

    1 java -jar picard.jar SortVcf I=test.vcf O=test.sorted.vcf SEQUENCE_DICTIONARY=ucsc.hg19.dict
  • 相关阅读:
    Norton我错怪了你啊~~
    RUNRMTCMD命令使用
    如何查看QTEMP的内容?可以查看别人的QTEMP的
    关于文件的ShareODP和USROPN
    虚拟主机权限之log4net
    如何向远程系统提交命令?
    在5250上面实现复制粘贴
    php与数据库对应实体类的命名
    Action Script 中的 super
    Linux下源码编译方式安装MySQL5.5.12(转)
  • 原文地址:https://www.cnblogs.com/xiaofeiIDO/p/7997543.html
Copyright © 2011-2022 走看看