static dictionary methods of text compression

zoukankan html css js c++ java

static dictionary methods of text compression
　　Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.

　　we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.

Now I plan to describe the first shortly with a routine example.

　　if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.

　　To make it clearer, I prefer to take a simple example to explain the method, as follows.

　　Now there is a signal composed by five letters, that is 'a', 'b', 'c', 'd' and 'r'. Then we get a dictionary accroding to our signal knowledge. The dictionary is

code letter

000 a

001 b

010 c

011 d

100 r

101 ab

110 ac

111 ad

　　Then I will code a sequence that is 'abracadabra'.

　　At first, the coder will read the first of two letters, which are 'ab'. After that, the coder have to find if the pair of letters is in our dictionary. If it does, the coder will return the letters's code and read the next letters. otherwise it will return the first letter's code and read the following letter. In this example, the coder will find the code in the dictionary, and return '101'. Following the step, the coder reads 'ra', but it cann't find the value of our dictionary by key 'ra'. So it have to return the code of 'r' that is '100', and read the letter 'c' following 'a' to compose of a new pair of letters that is 'ac'. The coder return '110'. Then read 'ad', return '110'. ...

　　The output is '101100110111101100000'.

　　The routine written by python is as follows.　　
1 def getCodeDict(): 2 codeDict = {} 3 codeDict['a'] = '000' 4 codeDict['b'] = '001' 5 codeDict['c'] = '010' 6 codeDict['d'] = '011' 7 codeDict['r'] = '100' 8 codeDict['ab'] = '101' 9 codeDict['ac'] = '110' 10 codeDict['ad'] = '111' 11 return codeDict 12 13 def compress(code): 14 print('start to compress') 15 result = '' 16 codeDict = getCodeDict() 17 offset = 2 18 unCodedCode = code 19 while unCodedCode != '': 20 targetCode = unCodedCode[0 : 2] 21 if targetCode in codeDict: 22 #find a pair of letters, and move two steps 23 result = result + codeDict[targetCode] 24 offset = 2 25 else : 26 #not find a pair of letters, and move only one step 27 result = result + codeDict[targetCode[0]] 28 offset = 1 29 unCodedCode = unCodedCode[offset : ] 30 print('complete to compress') 31 return result 32 33 if __name__=='__main__': 34 signals = 'abracadabra' 35 result = compress(signals) 36 print(result)
查看全文

相关阅读:
Binding to a Service
UML类图几种关系的总结
 阿里云调试
 Serif和Sans-serif字体的区别
 从Log4j迁移到LogBack的理由
 logback
java 解析json格式数据（转）
开源Web测试工具介绍
 GET乱码以及POST乱码的解决方法
 单元测试框架TestNg使用总结

原文地址：https://www.cnblogs.com/junyuhuang/p/3970155.html

code	letter
000	a
001	b
010	c
011	d
100	r
101	ab
110	ac
111	ad