zoukankan html css js c++ java

kallisto：Near-optimal RNA-Seq quantification

Near-optimal RNA-Seq quantification https://pachterlab.github.io/kallisto

输入输出文件说明：http://bio.math.berkeley.edu/eXpress/manual.html

文章标题：

Pseudoalignment for metagenomic read assignment

文章摘要：

We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data. In particular, we show that the recent idea of pseudoalignment introduced in the RNA-Seq context is suitable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software.

文章地址：

https://arxiv.org/abs/1510.07371v2

源代码：

https://pachterlab.github.io/kallisto/about

安装：

wget https://github.com/pachterlab/kallisto/releases/download/v0.43.0/kallisto_linux-v0.43.0.tar.gz

测试：

[biostack@localhost.localdomain test]$ /project/metagenomics_benchmark/kallisto_linux-v0.43.0/kallisto index -i --index transcripts.fasta

[biostack@localhost.localdomain test]$ /project/metagenomics_benchmark/kallisto_linux-v0.43.0/kallisto quant -i --index -o output reads_1.fastq reads_2.fastq（输入文件）

[biostack@localhost.localdomain output]$ more abundance.tsv

target_id length eff_length est_counts tpm

NM_001168316 2283 2105.9 160.606 12581

NM_174914 2385 2207.9 1500.72 112128

NR_031764 1853 1675.9 102.671 10106.2

NM_004503 1681 1503.9 331.118 36320.7

NM_006897 1541 1363.9 664 80311.3

NM_014212 2037 1859.9 55 4878.25

NM_014620 2300 2122.9 591.166 45937.9

NM_017409 1959 1781.9 47 4351.17

NM_017410 2396 2218.9 42 3122.5

NM_018953 1612 1434.9 227.999 26212.1

NM_022658 2288 2110.9 4881 381446

NM_153633 1666 1488.9 361.044 40002.4

NM_153693 2072 1894.9 73.6719 6413.67

NM_173860 849 671.903 962 236189

NR_003084 1640 1462.9 0.00164208 0.18517

使用说明：

kallisto

kallisto是一个用高通量测序片段从ＲＮＡ序列或更为普遍的目标序列中量化转录丰富度的一个程序。它是基于伪对齐的新的数据，用于快速确定reads目标，而无需alignment。在标准的ＲＮＡ序列数据中，kallisto能够在mac系统上用不到十分钟的时间构建索引，用不到三分钟的时间量化（也就是分类）３千ｗ人类的reads。reads伪对齐保留关键信息需要量化，并且kallisto不仅速度快，而且比现有的量化工具准确。事实上，由于伪对齐的过程是对reads出错上的健壮性，在许多基准中kallisto显著优于现有的工具。

kallisto能够用sleuth量化RNA序列分析。

kallisto产生的使用选项，这是一个列表：

kallisto 0.43.0

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

    index         Builds a kallisto index #构建一个kallisto索引
    quant         Runs the quantification algorithm #运行量化分析算法
    pseudo        Runs the pseudoalignment step#运行为比对
    h5dump        Converts HDF5-formatted results to plaintext#格式转换
    version       Prints version information#输出版本信息
    cite          Prints citation information#引用信息

Running kallisto <CMD> without arguments prints usage information for <CMD>

关于这些command说明如下：

index ：

kallisto index建立从靶序列的FASTA格式的文件的索引。该指数命令的参数有：

kallisto 0.43.0
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files#输入文件

Required argument: #必选参数
-i, --index=STRING          Filename for the kallisto index to be constructed #kallisto索引被构建的文件名

Optional argument:
-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)
    --make-unique           Replace repeated target names with unique names

输入文件为fasta格式，可以是压缩文件。

quant：

kallisto quant运行量化算法。对于定量命令的参数有：

kallisto 0.43.0
Computes equivalence classes for reads and quantifies abundances#对reads进行分类和物种丰富度评估

Usage: kallisto quant [arguments] FASTQ-files #输入文件

Required arguments: #必选参数
-i, --index=STRING            Filename for the kallisto index to be used for
                              quantification  #索引文件
-o, --output-dir=STRING       Directory to write output to  #输出文件目录

Optional arguments:
    --bias                    Perform sequence based bias correction
-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)
    --seed=INT                Seed for the bootstrap sampling (default: 42)
    --plaintext               Output plaintext instead of HDF5
    --single                  Quantify single-end reads
    --fr-stranded             Strand specific reads, first read forward
    --rf-stranded             Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE  Estimated average fragment length
-s, --sd=DOUBLE               Estimated standard deviation of fragment length
                              (default: value is estimated from the input data)
-t, --threads=INT             Number of threads to use (default: 1)
    --pseudobam               Output pseudoalignments in SAM format to stdout

kallisto可以处理单端或双端的序列，默认情况下是双端序列，输入为fastq文件：

kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq pairB_1.fastq pairB_2.fastq

对于单端序列可以用选项 --single ，也可用用 -l 和 -s 选项，然后列出输入的fastq文件即可：

kallisto quant -i index -o output --single -l 200 -s 20 file1.fastq.gz file2.fastq.gz file3.fastq.gz

kallisto quant produces three output files by default:

kallisto定量分析默认产生三个输出文件：

abundances.h5 ：二进制文件，包含运行信息，物种丰富度评估，bootstrap 评估等这个文件可以被sleuth打开阅读。
abundances.tsv ：是一个物种丰富度的说明文件。
run_info.json ：是一个包含运行的相关信息

可选参数说明：

Pseudobam：
--pseudobam，所有的伪比对输出格式为格式。可以被定向到一个文件中，也可以用samtools转换成bam。

例如： kallisto quant -i index -o out --pseudobam r1.fastq r2.fastq > out.sam

或者用samtools：

kallisto quant -i index -o out --pseudobam r1.fastq r2.fastq | samtools view -Sb - > out.bam 



　　　　　　　　　　　　　　　　　　（学校的秋天，哈哈）

pseudo

kallisto pseudo只是在伪比对这一环节运行并且其目的是为在单细胞RNA的序列的使用。pseudo详细的命令选项如下：

kallisto 0.43.0
Computes equivalence classes for reads and quantifies abundances

Usage: kallisto pseudo [arguments] FASTQ-files

Required arguments:
-i, --index=STRING            Filename for the kallisto index to be used for
                              pseudoalignment
-o, --output-dir=STRING       Directory to write output to

Optional arguments:
-u  --umi                     First file in pair is a UMI file
-b  --batch=FILE              Process files listed in FILE
    --single                  Quantify single-end reads
-l, --fragment-length=DOUBLE  Estimated average fragment length
-s, --sd=DOUBLE               Estimated standard deviation of fragment length
                              (default: value is estimated from the input data)
-t, --threads=INT             Number of threads to use (default: 1)
    --pseudobam               Output pseudoalignments in SAM format to stdout

该命令的格式和参数的含义是与quant命令相同。然而，pseudo不运行EM算法来量化丰度。此外pseudo指令有一个选项在批处理文件中指定许多细胞，如：

kallisto pseudo -i index -o output -b batch.txt

h5dump

kallisto h5dump转换 hdf5格式。对于h5dump命令的参数有：

kallisto 0.43.0
Converts HDF5-formatted results to plaintext

Usage:  kallisto h5dump [arguments] abundance.h5

Required argument:
-o, --output-dir=STRING       Directory to write output to

查看全文

相关阅读:
【NOIP2009】【Vijos1752】潜伏者
 【NOIP2008】【Vijos1493】传纸条
 【NOIP2007】【Vijos1378】矩阵取数游戏
 【NOIP2006】【Luogu1063】能量项链
 【NOIP2004】【Luogu1091】合唱队形
 【NOIP2004】【Luogu1089】津津的储蓄计划
 【NOIP2005】【Luogu1052】过河
 【NOIP2004】【Luogu1090】合并果子
 【NOI2002】【Luogu1196】银河英雄传说（并查集带边权）
【POJ3190】Stall Reservations

原文地址：https://www.cnblogs.com/ylHe/p/6080439.html