prefixspan python

zoukankan html css js c++ java

prefixspan python
from：https://github.com/chuanconggao/PrefixSpan-py

API Usage

Alternatively, you can use the algorithms via API.
from prefixspan import PrefixSpan db = [ [0, 1, 2, 3, 4], [1, 1, 1, 3, 4], [2, 1, 2, 2, 0], [1, 1, 1, 2, 2], ] ps = PrefixSpan(db)
For details of each parameter, please refer to the PrefixSpan class in prefixspan/api.py.

设置长度限制：
```
ps = PrefixSpan(db)
ps.minlen = 3
ps.maxlen = 5
print("?"*66)
------------------
```
print(ps.frequent(2)) # [(2, [0]), # (4, [1]), # (3, [1, 2]), # (2, [1, 2, 2]), # (2, [1, 3]), # (2, [1, 3, 4]), # (2, [1, 4]), # (2, [1, 1]), # (2, [1, 1, 1]), # (3, [2]), # (2, [2, 2]), # (2, [3]), # (2, [3, 4]), # (2, [4])] print(ps.topk(5)) # [(4, [1]), # (3, [2]), # (3, [1, 2]), # (2, [1, 3]), # (2, [1, 3, 4])] print(ps.frequent(2, closed=True)) print(ps.topk(5, closed=True)) print(ps.frequent(2, generator=True)) print(ps.topk(5, generator=True))
Closed Patterns and Generator Patterns

一个频繁的顺序模式是一种出现在序列数据库的至少“minsup”序列中的模式，其中最小支持度是用户设置的参数。

一个频繁闭合序列模式是一种频繁的顺序模式，使得它不包括在具有完全相同支持的另一顺序模式中。

算法如的PrefixSpan 找到频繁的顺序模式。算法如 BIDE+找到频繁的闭合序列模式。 BIDE +通常比PrefixSpan快得多，因为它使用修剪技术来避免生成所有顺序模式。此外，闭合模式集通常比连续模式集小得多，因此BIDE +也更具存储效率。

另一个重要的事情是，闭合序列模式是所有序列模式的紧凑和无损表示。这意味着闭合序列模式的集合通常要小得多，但它是无损的，这意味着它允许恢复整个连续模式集（没有信息丢失），这非常方便。

我可以举个简单的例子。

让我们考虑4个序列：
```
a  b  c  d  e
a  b  d
b  e  a  
b  c  d  e
```
让我们说minsup = 2。

b c 是一种频繁的序列模式，因为它出现在两个序列中（它支持2）。 b c 不是一个封闭的顺序模式，因为它包含在一个更大的顺序模式中 b c d 得到同样的支持。

b c d 它也是一个支持2.它也不是一个封闭的顺序模式，因为它包含在一个更大的顺序模式中 b c d e 得到同样的支持。 b c d e 是一个封闭的顺序模式，因为它没有包含在具有相同支持的任何其他顺序模式中。

The closed patterns are much more compact due to the smaller number.
- A pattern is closed if there is no super-pattern with the same frequency.
```
prefixspan-cli frequent 2 --closed test.dat

0 : 2
1 : 4
1 2 : 3
1 2 2 : 2
1 3 4 : 2
1 1 1 : 2
```
The generator patterns are even more compact due to both the smaller number and the shorter lengths.
- A pattern is generator if there is no sub-pattern with the same frequency.
- Due to the high compactness, generator patterns are useful as features for classification, etc.
```
prefixspan-cli frequent 2 --generator test.dat

0 : 2
1 1 : 2
2 : 3
2 2 : 2
3 : 2
4 : 2
```
There are patterns that are both closed and generator.
```
prefixspan-cli frequent 2 --closed --generator test.dat

0 : 2

备注：模式挖掘有很多算法。
```
SPMF offers implementations of the following data mining algorithms.

Sequential Pattern Mining

These algorithms discover sequential patterns in a set of sequences. For a good overview of sequential pattern mining algorithms, please read this survey paper.
- algorithms for mining sequential patterns in a sequence database
  
  the CM-SPADE algorithm (Fournier-Viger et al, 2014, powerpoint)
  
  the CM-SPAM algorithm (Fournier-Viger et al, 2014, powerpoint)
  
  the FAST algorithm (Salvemini et al, 2011)
  
  the GSP algorithm (Srikant et al., 1996)
  
  the LAPIN (aka LAPIN-SPAM) algorithm (Yang et al., 2005)
  
  the PrefixSpan algorithm (Pei et al., 2004)
  
  the SPADE algorithm (Zaki et al., 2001)
  
  the SPAM algorithm (Ayres et al., 2002)
- algorithms for mining closed sequential patterns in a sequence database
  
  the ClaSP algorithm (Gomariz et al., 2013)
  
  the CM-ClaSP algorithm (Fournier-Viger et al, 2014, powerpoint)
  
  the CloFAST algorithm (Fumarola et al, 2016)
  
  the CloSpan algorithm (Yan et al., 2003)
  
  the BIDE+ algorithm(Wang et al., 2007)
- algorithms for mining maximal sequential patterns in a sequence database
  
  the VMSP algorithm (Fournier-Viger et al, 2014, powerpoint)
  
  the MaxSP algorithm (Fournier-Viger et al., 2013, powerpoint).
- algorithms for mining the top-k sequential patterns in a sequence database
  
  the TKS algorithm (Fournier-Viger et al., 2013, powerpoint).
  
  the TSP algorithm (Tzvetkoz et al., 2003).
  
  the Skopus algorithm for mining the top-k sequential patterns using leverage and significance (Petijean et al., 2016)
- algorithms for mining sequential generator patterns in a sequence database
  
  the VGEN algorithm (Fournier-Viger et al, 2014)
  
  the FEAT algorithm (Gao et al., 2008).
  
  the FSGP algorithm (Yi et al., 2011).
- algorithms for mining compressing sequential patterns
  
  the GoKrimp and SeqKrimp algorithms (Lam et al., 2012; Lam et al., 2014)
- algorithms for mining multidimensional sequential patterns in a multidimensional sequence database
  
  the SeqDIM algorithm for mining frequent multidimensional sequential patterns in a multi-dimensional sequence database (Pinto et al., 2001)
  
  the Songram et al. algorithm for mining frequent closed multidimensional sequential patterns in a multi-dimensional sequence database (Songram et al. 2006)
- the Fournier-Viger et al. algorithm, a sequential pattern mining algorithm that combines several features from well-known sequential pattern mining algorithms and also proposes some original features (Fournier-Viger et al., 2008):
  
  mining sequences with minimum support by database-projection (based on PrefixSpan, Pei et al., 2004)
  
  mining sequences with min/max time interval between events and min/max time length of a sequence (based on Hirate-Yamana, 2006)
  
  mining closed sequences (based on the BIDE+ algorithm by Wang et al. 2007)
  
  mining multi-dimensional sequences (based on Pinto et al. 2001)
  
  mining closed multi-dimensional sequences (based on Songram et al. 2006 and Pasquier et al., 1999)
  
  mining sequences with items having integer values and performing automatic clustering of these values (original extension described in Fournier-Viger et al., 2008)
- algorithm for mining high-utility sequential patterns in a sequence database
  
  the USPAN algorithm (Yin et al. 2012)
- algorithm for mining high-utility probability sequential patterns in a sequence database
  
  the PHUSPM algorithm (Zhang et al. 2018)
  
  the UHUSPM algorithm (Zhang et al. 2018)
- algorithm for progressive sequential pattern mining with convergence guarantees
  
  the ProSecCo algorithm (Servan-Schreiber et al. 2018)
- the Occur algorithm for finding all occurrences of some sequential patterns in sequences by post-processing.
Sequential Rule Mining

These algorithms discover sequential rules in a set of sequences.
- algorithms for mining sequential rules in a sequence database
  
  the ERMiner algorithm (Fournier-Viger et al., 2014)
  
  the RuleGrowth algorithm (Fournier-Viger et al., 2011, Fournier-Viger et al., 2015, powerpoint, video)
  
  the CMRules algorithm (Fournier-Viger et al., 2010, powerpoint)
  
  the CMDeo algorithm (Fournier-Viger et al., 2010)
  
  the RuleGen algorithm (Zaki et al, 2001)
- algorithms for mining sequential rules in a sequence database with the window size constraint
  
  the TRuleGrowth algorithm (Fournier-Viger, 2012a, Fournier-Viger et al., 2015)
- algorithms for mining top-k sequential rules in a sequence database
  
  the TopSeqRules algorithm for mining the top-k sequential rules (Fournier-Viger et al., 2011, powerpoint)
  
  the TopSeqClassRules algorithm for mining the top-k class sequential rules (a variation of Fournier-Viger et al., 2011)
  
  the TNS algorithm for mining the top-k non-redundant sequential rules (Fournier-Viger 2013)
- algorithm for mining high-utility sequential rules in a sequence database
  
  the HUSRM algorithm (Zida et al., 2015)
Sequence Prediction

These algorithms predict the next symbol(s) of a sequence based on a set of training sequences
- algorithms for predicting the next symbol of a sequence based on a set of training sequences
  
  the Compact Prediction Tree+ (CPT+) algorithm (Gueniche et al., 2015, powerpoint)
  
  the Compact Prediction Tree (CPT) algorithm (Gueniche et al., 2013)
  
  the First order Markov Chains (PPM - order 1) (Clearly et al, 1984)
  
  the Dependency Graph (DG) (Padmanabhan, 1996)
  
  the All-k-Order Markov Chains (AKOM) (Pitkow, 1999)
  
  the TDAG (Laird & Saul, 1994)
  
  the LZ78 (Ziv, 1978)
Itemset Mining

These algorithms discover interesting itemsets (sets of values) that appear in a transaction database (database records containing symbolic data). For a good overview of itemset mining, please read this survey paper.
- algorithms for discovering frequent itemsets in a transaction database.
  
  the Apriori algorithm (Agrawal & Srikant, 1994)
  
  the AprioriTID algorithm (Agrawal & Srikant, 1994)
  
  the FP-Growth algorithm (Han et al., 2004)
  
  the Eclat algorithm (Zaki, 2000)
  
  the dEclat algorithm (Zaki and Gouda, 2001, 2003)
  
  the Relim algorithm (Borgelt, 2005)
  
  the H-Mine algorithm (Pei et al., 2007)
  
  the LCMFreq algorithm (Uno et al., 2004)
  
  the PrePost and PrePost+ algorithms (Deng et al., 2012, Deng et Lv, 2015)
  
  the FIN algorithm (Deng et al., 2014)
  
  the DFIN algorithm (Deng et al., 2016)
  
  the NegFIN algoritm (Aryabarzan et al., 2018)
- algorithms for discovering frequent closed itemsets in a transaction database.
  
  the FPClose algorithm (Grahne and Zhu, 2005)
  
  the Charm algorithm (Zaki and Hsiao, 2002)
  
  the dCharm algorithm (Zaki and Gouda, 2001)
  
  the DCI_Closed algorithm (Lucchese et al, 2004)
  
  the LCM algorithm (Uno et al., 2004)
  
  the AprioriClose aka Close algorithm (Pasquier et al., 1999)
  
  the AprioriTID Close algorithm (Pasquier et al., 1999, Agrawal & Srikant, 1994)
- algorithms for recovering all frequent itemsets from frequent closed itemsets:
  
  the LevelWise algorithm (Pasquier et al., 1999)
  
  the DFI-Growth algorithm (___ et al., 2018)
- algorithms for discovering frequent maximal itemsets in a transaction database.
  
  the FPMax algorithm (Grahne and Zhu, 2003)
  
  the Charm-MFI algorithm for discovering frequent closed itemsets and maximal frequent itemsets by post-processing in a transaction database (Szathmary et al. 2006)
- algorithms for mining frequent itemsets with multiple minimum supports
  
  the MSApriori algorithm (Liu et al, 1999)
  
  the CFPGrowth++ algorithm (Uday & Reddy, 2011, Hu & Chen, 2006)
- algorithms for mining generator itemsets in a transaction database
  
  the DefMe algorithm for mining frequent generator itemsets in a transaction database (Soulet & Rioult, 2014)
  
  the Pascal algorithm for mining frequent itemsets, and identifying at the same time which one are generators (Bastide et al., 2002)
  
  the Zart algorithm for discovering frequent closed itemsets and their generators in a transaction database (Szathmary et al. 2007)
- algorithms for mining rare itemsets and/or correlated itemsets in a transaction database
  
  the AprioriInverse algorithm for mining perfectly rare itemsets (Koh & Roundtree, 2005)
  
  the AprioriRare algorithm for mining minimal rare itemsets and frequent itemsets (Szathmary et al. 2007b)
  
  the CORI algorithm for mining minimal rare correlated itemsets using the support and bond measures (Bouasker et al. 2015)
  
  the RP-Growth algorithm for mining rare itemsets (Tsang et al., 2011)
- algorithms for performing targeted and dynamic queries about association rules and frequent itemsets.
  
  the Itemset-Tree, a data structure that can be updated incrementally, and algorithms for querying it. (Kubat et al, 2003)
  
  the Memory-Efficient Itemset-Tree, a data structure that can be updated incrementally, and algorithms for querying it. (Fournier-Viger, 2013, powerpoint)
- algorithms to discover frequent itemsets in a stream
  
  the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
  
  the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
  
  the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)
- the U-Apriori algorithm for mining frequent itemsets in uncertain data (Chui et al, 2007)
- the VME algorithm for mining erasable itemsets (Deng & Xu, 2010)
- algorithms to discover fuzzy frequent itemsets in a quantitative transaction database
  
  the FFI-Miner algorithm for mining fuzzy itemsets (Lin et al., 2015)
  
  the MFFI-Miner algorithm for mining multiple fuzzy itemsets (Lin et al., 2016)
Periodic Pattern Mining

These algorithms discover patterns that periodically appear in a sequence of complex events (also called a transaction database)
- the PFPM algorithm (Fournier-Viger et al, 2016a, powerpoint, video ) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
- the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint) for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
Episode Mining

These algorithms discover episodes that appear in a single sequence of complex events.
- the TUP algorithm (Rathore et al., 2016) for mining the top-k high utility episodes in a sequence of complex events (a transaction database) with utility information
- the US-SPAN algorithm (Wu et al., 2013 ) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
High-Utility Pattern Mining

These algorithms discover patterns having a high utility (importance) in different kinds of data. For a good overview of high utility itemset mining, you may read this survey paper, and the high utility-pattern mining book.
- algorithms for mining high-utility itemsets in a transaction database having profit information
  
  the EFIM algorithm (Zida et al. 2016, Zida et al., 2015, powerpoint)
  
  the FHM algorithm (Fournier-Viger et al., 2014, powerpoint)
  
  the HUI-Miner algorithm (Liu & Qu, 2012)
  
  the HUP-Miner algorithm (Krishnamoorthy, 2014)
  
  the mHUIMiner algorithm (Peng et al., 2017)
  
  the HMiner algorithm (Krishnamoorty, 2017)
  
  the ULB-Miner algorithm (Duong et al, 2018)
  
  the UFH algorithm (Dawar et al, 2017)
  
  the IHUP algorithm (Ahmed et al., 2009)
  
  the Two-Phase algorithm (Liu et al., 2005)
  
  the UP-Growth algorithm (Tseng et al., 2011)
  
  the UP-Growth+ algorithm (Tseng et al., 2013)
  
  the UP-Hist algorithm (Dawar et al., 2015)
  
  the d2HUP algorithm (Liu et al, 2012)
- algorithm for efficiently mining high-utility itemsets with length constraints in a transaction database
  
  the FHM+ algorithm (Fournier-Viger et al, 2016, powerpoint)
- algorithm for mining correlated high-utility itemsets in a transaction database
  
  the FCHM_bond algorithm, to use the bond measure (Fournier-Viger et al, 2016, powerpoint, Fournier-Viger 2018 et al., to appear, video )
  
  the FCHM_allconfidence algorithm, to use the all-confidence measure (Fournier-Viger et al, 2016, powerpoint, Fournier-Viger 2018 et al., to appear)
- algorithm for mining high-utility itemsets in a transaction database containing negative unit profit values
  
  the FHN algorithm (Fournier-Viger et al., 2014, powerpoint)
  
  the HUINIV-Mine algorithm (Chu et al., 2009)
- algorithm for mining frequent high-utility itemsets in a transaction database
  
  the FHMFreq algorithm, a variation of the FHM algorithm (Fournier-Viger et al., 2014)
- algorithm for mining on-shelf high-utility itemsets in a transaction database containing information about time periods of items
  
  the FOSHU algorithm (Fournier-Viger et al., 2015, powerpoint)
  
  the TS-HOUN algorithm (Lan et al., 2014)
- algorithm for incremental high-utility itemset mining in a transaction database
  
  the EIHI algorithm (Fournier-Viger et al., 2015, powerpoint)
  
  the HUI-LIST-INS algorithm (Lin et al., 2014)
- algorithm for mining concise representations of high-utility itemsets in a transaction database
  
  the HUG-Miner algorithm (Fournier-Viger et al., 2014, powerpoint) for mining high-utility generators
  
  the GHUI-Miner algorithm (Fournier-Viger et al., 2014, powerpoint) for mining generators of high-utility itemsets
  
  the MinFHM algorithm (Fournier-Viger et al., 2016, powerpoint, video ) for mining minimal high-utility itemsets
  
  the EFIM-Closed algorithm (Fournier-Viger et al., 2016, powerpoint) for mining closed high-utility itemsets
  
  the CHUI-Miner algorithm (Wu et al., 2015) for mining closed high-utility itemsets
  
  the CHUD algorithm for mining closed high-utility itemsets (Tseng et al., 2011/2015)
  
  the CHUI-Miner(Max) algorithm for mining maximal high utility itemsets (Wu et al., 2019).
- algorithm for mining the skyline high-utility itemsets in a transaction database
  
  the SkyMine algorithm (Goyal et al., 2015)
- algorithm for mining the top-k high-utility itemsets in a transaction database
  
  the TKU algorithm (Tseng et al., 2015), obtained from UP-Miner under GPL license
  
  the TKO-Basic algorithm (Tseng et al., 2015)
- algorithms for mining the top-k high utility itemsets from a data stream with a window
  
  the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)
- algorithm for mining frequent skyline utility patterns in a transaction database
  
  the SFUPMinerUemax algorithms (Lin et al, 2016)
- algorithm for mining quantitative high utility itemsets in a transaction database:
  
  the VHUQI algorithm (Wu et al., 2014)
- algorithm for mining high-utility sequential rules in a sequence database
  
  the HUSRM algorithm (Zida et al., 2015)
- algorithm for mining high-utility sequential patterns in a sequence database
  
  the USPAN algorithm (Yin et al. 2012)
- algorithm for mining high-utility probability sequential patterns in a sequence database
  
  the PHUSPM algorithm (Zhang et al. 2018)
  
  the UHUSPM algorithm (Zhang et al. 2018)
- algorithm for mining high-utility itemsets in a transaction database using evolutionary algorithms
  
  the HUIM-GA algorithm (Kannimuthu et al., 2014)
  
  the HUIM-BPSO algorithm (Lin et al, 2016)
  
  the HUIM-GA-tree algorithm (Lin et al, 2016)
  
  the HUIM-BPSO-tree algorithm (Lin et al, 2016)
  
  the HUIF-PSO algorithm (Song et al., 2018)
  
  the HUIF-GA algorithm (Song et al., 2018)
  
  the HUIF-BA algorithm (Song et al., 2018)
- algorithm for mining high average-utility itemsets in a transaction database
  
  the HAUI-Miner algorithm for mining high average-utility itemsets (Lin et al, 2016)
  
  the EHAUPM algorithm for mining high average-utility itemsets (Lin et al, 2017)
  
  the HAUI-MMAU algorithm for mining high average-utility itemsets with multiple thresholds (Lin et al, 2016)
  
  the MEMU algorithm for mining high average-utility itemsets with multiple thresholds (Lin et al, 2018)
- algorithms for mining high utility episodes in a sequence of complex events (a transaction database)
  
  the TUP algorithm (Rathore et al., 2016) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
  
  the UP-SPAN algorithm (Wu et al., 2013 ) for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
- algorithms for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
  
  the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint)
- algorithms for discovering irregular high utility itemsets (non periodic patterns) in a transaction database with utility information
  
  the PHM_irregular algorithm, which is a simple variation of the PHM algorithm
- algorithm for discovering local high utility itemsets in a database with utility information and timestamps
  
  the LHUI-Miner algorithm (Fournier-Viger et al., 2019)
- algorithm for discovering peak high utility itemsets in a database with utility information and timestamps
  
  the PHUI-Miner algorithm (Fournier-Viger et al., 2019)
Association Rule Mining

These algorithms discover interesting associations between symbols (values) in a transaction database (database records with binary attributes).
- an algorithm for mining all association rules in a transaction database (Agrawal & Srikant, 1994)
- an algorithm for mining all association rules with the lift measure in a transaction database (adapted from Agrawal & Srikant, 1994)
- an algorithm for mining the IGB informative and generic basis of association rules in a transaction database (Gasmi et al., 2005)
- an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
- an algorithm for mining closed association rules (Szathmary et al. 2006).
- an algorithm for mining minimal non redundant association rules (Kryszkiewicz, 1998)
- the Indirect algorithm for mining indirect association rules (Tan et al. 2000; Tan et 2006)
- the FHSAR algorithm for hiding sensitive association rules (Weng et al. 2008)
- the TopKRules algorithm for mining the top-k association rules (Fournier-Viger, 2012b, powerpoint)
- the TopKClassRules algorithm for mining the top-k class association rules (a variation of TopKRules. This latter is described in Fournier-Viger, 2012b, powerpoint)
- the TNR algorithm for mining top-k non-redundant association rules (Fournier-Viger 2012d, powerpoint)
Stream pattern mining

These algorithms discovers various kinds of patterns in a stream (an infinite sequence of database records (transactions))
- the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
- the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
- the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)
- algorithms for mining the top-k high utility itemsets from a data stream with a window
  
  the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)
Clustering

These algorithms automatically find clusters in different kinds of data
- the original K-Means algorithm (MacQueen, 1967)
- the Bisecting K-Means algorithm (Steinbach et al, 2000)
- algorithms for density-based clustering
  
  the DBScan algorithm (Ester et al., 1996)
  
  the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
- a hierarchical clustering algorithm
- a tool called Cluster Viewer for visualizing clusters
- a tool called Instance Viewer for visualizing the input of clustering algorithms
Time series mining

These algorithms perform various tasks to analyze time series data
```
 
```
查看全文

相关阅读:
7.16，7.18练习题
 Summer training（一）
Correct Solution?
[欢迎来怼] 团队第一周贡献分分配结果
 欢迎来怼—选题展示
 视频展示
 美工+文案展示
 作业要求20171015贡献分分配规则
 作业要求20170928-4 每周例行报告
 作业要求20170928-3 四则运算试题生成

原文地址：https://www.cnblogs.com/bonelee/p/10696521.html

API Usage

Closed Patterns and Generator Patterns

Sequential Pattern Mining

Sequential Rule Mining

Sequence Prediction

Itemset Mining

Periodic Pattern Mining

Episode Mining

High-Utility Pattern Mining

Association Rule Mining

Stream pattern mining

Clustering

Time series mining