zoukankan      html  css  js  c++  java
  • static dictionary methods of text compression

      Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.

      we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.

    Now I plan to describe the first shortly with a routine example.

      if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.

      To make it clearer, I prefer to take a simple example to explain the method, as follows.

      Now there is a signal composed by five letters, that is 'a', 'b', 'c', 'd' and 'r'. Then we get a dictionary accroding to our signal knowledge. The dictionary is

    code letter
    000 a
    001 b
    010 c
    011 d
    100 r
    101 ab
    110 ac
    111 ad

      Then I will code a sequence that is 'abracadabra'.

      At first, the coder will read the first of two letters, which are 'ab'. After that, the coder have to find if the pair of letters is in our dictionary. If it does,  the coder will return the letters's code and read the next letters. otherwise it will return the first letter's code and read the following letter. In this example, the coder will find the code in the dictionary, and return '101'. Following the step, the coder reads 'ra', but it cann't find the value of our dictionary by key 'ra'. So it have to return the code of 'r' that is '100', and read the letter 'c' following 'a' to compose of a new pair of letters  that is 'ac'. The coder return '110'. Then read 'ad', return '110'. ...

      The output is '101100110111101100000'.

      The routine written by python is as follows.  

     1 def getCodeDict():
     2     codeDict = {}
     3     codeDict['a'] = '000'
     4     codeDict['b'] = '001'
     5     codeDict['c'] = '010'
     6     codeDict['d'] = '011'
     7     codeDict['r'] = '100'
     8     codeDict['ab'] = '101'
     9     codeDict['ac'] = '110'
    10     codeDict['ad'] = '111'
    11     return codeDict
    12 
    13 def compress(code):
    14     print('start to compress')
    15     result = ''
    16     codeDict = getCodeDict()
    17     offset = 2
    18     unCodedCode = code
    19     while unCodedCode != '':
    20         targetCode = unCodedCode[0 : 2] 
    21         if targetCode in codeDict:
    22             #find a pair of letters, and move two steps
    23             result = result + codeDict[targetCode]
    24             offset = 2
    25         else :
    26             #not find a pair of letters, and move only one step
    27             result = result + codeDict[targetCode[0]]
    28             offset = 1
    29         unCodedCode = unCodedCode[offset : ]
    30     print('complete to compress')
    31     return result  
    32     
    33 if __name__=='__main__':
    34     signals = 'abracadabra'
    35     result = compress(signals)
    36     print(result)
  • 相关阅读:
    我的第一个可用的Windows驱动完成了
    据说是一种很古老的方法
    起一卦,测今天工作,问题不少
    起一卦,找房子,马上没房子住了
    哈哈哈哈,我竟然发现了个MSDN里面的笔误
    起一卦,看现在我的工程进度怎么样。
    起卦帮同学看工作,应了。
    2012年10月17日帮朋友算得第一卦
    2013年1月13日帮朋友测的第二卦,有些地方没看出来
    bzoj2588 Spoj 10628. Count on a tree
  • 原文地址:https://www.cnblogs.com/junyuhuang/p/3970155.html
Copyright © 2011-2022 走看看