zoukankan      html  css  js  c++  java
  • 数学之路-python计算实战(4)-Lempel-Ziv压缩(2)

    Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<''>''!' or '='. When using native size, the size of the packed value is platform-dependent.

    本博客所有内容是原创,假设转载请注明来源

    http://blog.csdn.net/myhaspl/


    FormatC TypePython typeStandard sizeNotes
    xpad byteno value  
    ccharstring of length 11 
    bsigned charinteger1(3)
    Bunsigned charinteger1(3)
    ?_Boolbool1(1)
    hshortinteger2(3)
    Hunsigned shortinteger2(3)
    iintinteger4(3)
    Iunsigned intinteger4(3)
    llonginteger4(3)
    Lunsigned longinteger4(3)
    qlong longinteger8(2), (3)
    Qunsigned long longinteger8(2), (3)
    ffloatfloat4(4)
    ddoublefloat8(4)
    schar[]string  
    pchar[]string  
    Pvoid *integer (5), (3)

    struct.pack(fmtv1v2...)

    Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.

    truct.unpack(fmtstring)

    Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

    读文本文件并压缩以及解 压 ,部分代码例如以下:

      

    # -*- coding: utf-8 -*- 
    #lempel-ziv算法
    #code:myhaspl@myhaspl.com
    import struct
    mystr=""
    print "
    读取源文件".decode("utf8")
    mytextfile= open('test2.txt','r')
    try:
         mystr=mytextfile.read( )
    finally:
         mytextfile.close()
    my_str=mystr
    #码表
    codeword_dictionary={}
    #待压缩文本长度
    str_len=len(my_str)
    #码字最大长度
    dict_maxlen=1
    #将解析文本段的位置(下一次解析文本的起点)
    now_index=0
    #码表的最大索引
    max_index=0
    
    #压缩后数据
    print "
    生成压缩数据中".decode("utf8") 
    compresseddata=[]
    while (now_index<str_len):    
        #向后移动步长
        mystep=0
        #当前匹配长度
        now_len=dict_maxlen
        if now_len>str_len-now_index:
            now_len=str_len-now_index
        #查找到的码表索引。0表示没有找到
        cw_addr=0   
        while (now_len>0):
            cw_index=codeword_dictionary.get(my_str[now_index:now_index+now_len])
            if cw_index!=None:
        		#找到码字
                cw_addr=cw_index
                mystep=now_len  
                break
            now_len-=1    
        if cw_addr==0:
            #没有找到码字,添加新的码字
            max_index+=1
            mystep=1
            codeword_dictionary[my_str[now_index:now_index+mystep]]=max_index
            print "don't find the Code word,add Code word:%s index:%d"%(my_str[now_index:now_index+mystep],max_index)
        else:
            #找到码字,添加新的码字
            max_index+=1    
            if now_index+mystep+1<=str_len:
                codeword_dictionary[my_str[now_index:now_index+mystep+1]]=max_index
                if mystep+1>dict_maxlen:
                    dict_maxlen=mystep+1      
            print "find the Code word:%s  add Code word:%s index:%d"%(my_str[now_index:now_index+now_len],my_str[now_index:now_index+mystep+1],max_index)  
    .......
    ......
            my_codeword_dictionary[my_maxindex]=my_codeword_dictionary[cwkey]+cwlaster        
            uncompressdata.append(my_codeword_dictionary[cwkey])
            uncompressdata.append(cwlaster)     
        print ".",
    uncompress_str=uncompress_str.join(uncompressdata)
    uncompressstr=uncompress_str
    print "
    将解压结果写入文件里..
    ".decode("utf8")
    uncompress_file= open('uncompress.txt','w')
    try:
        uncompress_file.write(uncompressstr)
        print "
    解压成功,已解压到uncompress.txt!

    ".decode("utf8") finally: uncompress_file.close()

    以下对中文维基中对python的解释文本进行压缩:


    调用该程序先压缩形成压缩文件,然后打开压缩文件解压

    $ pypy lempel-ziv-compress.py python.txt python.lzv

    ………………..

    find the Code word: C  add Code word: CP index:9938

     index:9939de word:ython  add Code word:ython

    find the Code word:

    ^  add Code word:

    ^ h index:9940

    find the Code word:ttp  add Code word:ttp: index:9941

    find the Code word://  add Code word://e index:9942

    find the Code word:dit  add Code word:ditr index:9943

    find the Code word:a.  add Code word:a.o index:9944

    生成压缩数据头部

    将压缩数据写入压缩文件里

    …………….

    . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .

    将解压结果写入文件里..

    解压成功,已解压到uncompress.txt!

    查看压缩效果:

    $ ls -l -h

    …………….

    -rw-rw-r-- 1 deep deep 5.0K Jul  1 20:55 lempel-ziv-compress.py

    -rw-rw-r-- 1 deep deep  30K Jul 1 20:55 python.lzv

    -rw-rw-r-- 1 deep deep  36K Jul 1 20:57 python.txt

    -rw-rw-r-- 1 deep deep  36K Jul 1 20:55 uncompress.txt从上面显示结果能够看到,没压缩前为36K,压缩后为30k

    压缩sqlite 3.8.5的所有源代码

    $ pypy lempel-ziv-compress.py sqlitesrc.txtsqlitesrc.lzv

    查看压缩效果:

    $ ls -l -h

    …………….

    -rw-rw-r-- 1 deep deep 3.2M Jul  1 21:18 sqlitesrc.lzv

    -rw-rw-r-- 1 deep deep 5.2M Jul  1 21:16 sqlitesrc.txt

    -rw-rw-r-- 1 deep deep 5.2M Jul  1 21:18 uncompress.txt

    没压缩前为5.2M,压缩后为3.2M

     


  • 相关阅读:
    ElasticSearch 清理索引
    Docker 服务接入SkyWalking
    Promethues mysql_exporter 集中式监控
    修改SVN密码自助平台
    快速排序(golang)
    ElasticSearch Xpack集群认证和elasticsearch-head配置
    Ansible一个tasks失败则终止剩余的task
    Consul安装
    最纯净的开发者技术交流社群
    Flutter中的报错:(IOS pod 版本错误) error: compiling for iOS 8.0, but module 'xxx' has a minimum deployment target of iOS 9.0
  • 原文地址:https://www.cnblogs.com/mthoutai/p/6923250.html
Copyright © 2011-2022 走看看