zoukankan      html  css  js  c++  java
  • 【python cookbook】python访问子字符串

    访问子字符串最简单的的方式是使用切片

    afiled = theline[3:8]

    但一次只能取一个子字符串

    如果还要考虑字段的长度 struct.unpack可能更合适

    import struct
    #得到一个5字节的字符串 跳过三字节 得到两个8字节的字符串 以及其余部分
    
    baseformat = "5s 3x 8s 8s"
    #theline超出的长度也由这个base-format 确定
    numremain = len(theline) - struct.calcsize(baseformat)
    #用合适的s或者x字段完成格式 然后unpack
    format = "%s %ds" % (baseformat,numremain)
    l,s1,s2,t = struct.unpack(format,theline)
    #test

    >>> theline = "numremain = len(theline) - struct.calcsize(baseformat)" >>> numremain = len(theline) - struct.calcsize(baseformat) >>> format = "%s %ds" % (baseformat,numremain) >>> format '5s 3x 8s 8s 30s' >>> l,s1,s2,t = struct.unpack(format,theline) >>> l 'numre' >>> s1 'n = len(' >>> s2 'theline)' >>> t ' - struct.calcsize(baseformat)'

    如果获取固定字长的数据,可以利用带列表推导(LC)的切片方法

    pieces = [theline[k:k+n] for k in xrange(0,len(theline),n)]

    如果想把数据切成指定长度的列 用带LC的切片方法比较容易实现

    cuts = [8,14,20,26,30]
    pieces = [ theline[i,j] for i j in zip([0]+cuts,cuts+[None])]

    在LC中调用zip,返回的是一个列表每项形如cuts[k],cuts[k+1]

    第一项和最后一项为(0,cuts[0]) (cuts[len(cuts)-1],None)

     

    将以上代码片段封装成函数

    def fields(baseformat,theline,lastfield=False):
        #theline 超出的长度也有这个base-format 确定
        #(通过 struct.calcsize计算切片的长度)
        numremain = len(theline)-struct.calcsize(baseformat)
    
        #用合适的s或者x字段完成格式 然后unpack
        format = "%s %d %s" % (baseformat,numre

    下边这个是使用memoizing机制的版本

    def fields(baseformat,theline,lastfield=False,_cache={ }):
        #生成键并尝试获得缓存的格式字符串
        key = baseformat,len(theline),lastfield
        format _cache.get(key)
        if format is None:    
            #m没有缓存的格式字符串 创建并缓存
            numremain = len(theline) - struct.calcsize(baseformat)
            _cache[key] = format = "%s %d%s" % (
                baseformat,numremain,lastfield and "s" or "x")
        return struct.unpack(format,theline)

    cookbook上说的这个比优化之前的版本快30%到40% 不过如果这里不是瓶颈部分,没必要使用这种方法

    使用LC切片函数

    def split_by(theline,n,lastfield=False):
        #切割所有需要的片段
        pieces = [theline[k:k+n] for k in xrange(0,len(theline),n)]
        #弱最后一段太短或不需要,丢弃
        if not lastfield and len(pieces[-1] < n):
            pieces.pop()
        return pieces
    def split_at(theline,cuts,lastfield=False):
        #切割所有需要的片段
        pieces = [ theline[i,j] for i j in zip([0]+cuts,cuts+[None])]
        #若不需要最后一段 丢弃
        if not lastfield:
            pieces.pop()
        return pieces

    使用生成器的版本

    def split_at(the_line,cuts,lastfield=False):
        last = 0
        for cut in cuts:
            yield the_line[last:cut]
            last = cut
        if lastfield:
            yield the_line[last:]
    def split_by(the_line,n,lastfield=False):
        return split_at1(the_line,xrange(n,len(the_line),n),lastfield)

    zip()的用法

    zip([iterable...])

    This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.

    The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

    zip() in conjunction with the * operator can be used to unzip a list:

    >>> x = [1, 2, 3]
    >>> y = [4, 5, 6]
    >>> zipped = zip(x, y)
    >>> zipped
    [(1, 4), (2, 5), (3, 6)]
    >>> x2, y2 = zip(*zipped)
    >>> x == list(x2) and y == list(y2)
    True

      >>> x2
      (1, 2, 3)
      >>> y2
      (4, 5, 6)

     

    生成器的用法参见这篇博客 http://www.cnblogs.com/cacique/archive/2012/02/24/2367183.html

  • 相关阅读:
    逻辑思维题:称金币
    Windows7中Emacs 24 shell使用Gitbash
    Android中Touch事件分析--解决HorizontalScrollView滑动和按钮事件触发问题
    hdu 3732 Ahui Writes Word
    lucene基本原理
    elasticsearch分析系列
    主流的自动化运维工具
    IDEA在当前类中查找方法快捷键--转
    迷你MVVM框架 avalonjs 0.81发布
    最火的前端开源项目
  • 原文地址:https://www.cnblogs.com/cacique/p/2603640.html
Copyright © 2011-2022 走看看