zoukankan      html  css  js  c++  java
  • 项目1:即时标记

    接触的第一个python项目,老实说,这个好像并不容易啊,加之对python的不熟悉,确实是搞了很久。

    文本文档内容如下:

     1 Welcome to World Wide Spam, Inc  
     2 
     3 
     4 These are the corporate web pages of *World Wide Spam*, Inc. We hope you find your enjoyable, and that you will sample many of our products  
     5   
     6 A short history of the company  
     7   
     8 World Wide Spam was started in the summer of 2000. The business concept was to ride the dot-com wave and to make money both through bulk email and by selling canned meat online  
     9   
    10 After receiving several complaints from customer who weren't satisfied bu their bulk email .World Wide Spam altered their profile. and foused 100% on canned goods. Today they rank as the world's 13.892nd online suppler of SPAM  
    11   
    12 Destinations  
    13   
    14 From this page you may visit several of our interesting web pages:  
    15 
    16     -What is SPAM?(http://www.baidu.com)
    17 
    18     -How do they make it?(http://www.baidu.com)  
    19 
    20     -Why should i eat is?(http://www.baidu.com)  
    21   
    22 How to get in touch with us  
    23 
    24 You can get in touch with us in *many* ways: By phone(555-1234), by email(wwspam@wwspam.fu) or by visiting our customer feedback page(http://wwspam.fu/feedback).
    test.txt

    ①文本块生成器(util.py)

     1 def lines(file):
     2     for line in file: yield line
     3     yield '
    '
     4 
     5 
     6 def blocks(file):
     7     block = []
     8     for line in lines(file):
     9         if line.strip():
    10             block.append(line)
    11         elif block:
    12             yield ''.join(block).strip()
    13             block = []

    一开始对于这段代码不是很明白,需要了解yield的用法,其实它就是每次返回一个值,然后函数冻结,下一次再从上一次的地方继续运行下去。strip()方法是移除字符串头尾指定的字符(默认就是空格),所以如果为空的话,就遇到了一个空行,也就是进入到了一个新的段,那么此时上一个段就已经寻找完了,可以返回了。这里lines函数的作用就是在文本的最后添加一个空行,否则的话最后一个块就无法返回了。

    我对这段代码进行了一下测试,尝试着输出第一块的内容:

    如果第一段和第二段之间没有空行,那么就会输出这样的情况:

    ②处理程序(handlers.py)

     1 class Handler:
     2 
     3     #判断当前类是否有对应的方法,所有的话则根据提供的额外参数使用对应方法
     4     def callback(self,prefix,name,*args):
     5         method = getattr(self,prefix+name,None)
     6         if callable(method):return method(*args)
     7     
     8     #callback的辅助方法,前缀就是start,只需要提供方法名即可
     9     def start(self,name):
    10         self.callback('start_',name)
    11     #前缀为end的callback辅助方法
    12     def end(self,name):
    13         self.callback('end_',name)
    14     
    15     #返回方法名subsutitution    
    16     def sub(self,name):
    17         def substitution(match):
    18             result = self.callback('sub_',name,match)
    19             if result is None: result = match.group(0)
    20             return result
    21         return substitution
    22 
    23 class HTMLRenderer(Handler):
    24     def start_document(self):
    25         print ('<html><head><title>title</title></head><body>')
    26     def end_documrnt(self):    
    27         print ('</body></html>')
    28     def start_paragraph(self):
    29         print ('<p>')
    30     def end_paragraph(self):
    31                 print ('</p>')
    32     def start_heading(self):
    33                 print ('<h2>')
    34     def end_heading(self):
    35                 print ('</h2>')
    36     def start_list(self):
    37                 print ('<ul>')
    38     def end_list(self):
    39                 print ('</ul>')
    40     def start_listitem(self):
    41                 print ('<li>')
    42     def end_listitem(self):
    43                 print ('</li>')
    44     def start_title(self):
    45                 print ('<h1>')
    46     def end_title(self):
    47                 print ('</h1>')
    48     def sub_emphasis(self,match):
    49         return '<em>%s</em>' % match.group(1)
    50     def sub_url(self,match):
    51         return '<a href="%s">%s</a>' % (match.group(1),match.group(1))
    52     def sub_mail(self,match):
    53         return '<a href="mailto:%s">%s</a>' % (match.group(1),match.group(1))
    54     def feed(self,data):
    55         print(data)

    这段代码是有点难度的,首先是callback函数,里面getattr的作用是检验类里是否有prefix+name这个函数,有就返回它的内存地址。callable是内置函数,检验函数是否可用,如果可用的话就用调用该函数。

    最难的是def sub这个函数,书上有一个例子是这样说的:

    >>> handler.sub('emphasis')
    <function substitution at 0x168cf8>

    也就是它会返回一个substitution函数。接下来重点是这个:

    >>> import re
    >>> re.sub(r'*(.+?)*', handler.sub('emphasis'), 'This *is* a test')
    'This <em>is</em> a test'

    中间的handler.sub('emphasis')会返回substitution()函数,可是这个函数有match这个参数啊,那么这里谁来当参数呢?

    在这里面正则表达式匹配得到的结果是is,此时is就作为了参数去执行函数。这样应该就明白了吧。

    ③规则(rules.py)

     1 class Rule:
     2     def action(self,block,handler):
     3         handler.start(self.type)
     4         handler.feed(block)
     5         handler.end(self.type)
     6         return True
     7 
     8 class HeadingRule(Rule):
     9     type = 'heading'
    10     #不包含
    ,也就是说并非最后一个块;长度小于70;不以冒号结尾
    11     def condition(self,block):
    12         return not '
    ' in block and len(block) <=70 and not block[-1] == ':'
    13 
    14 class TitleRule(HeadingRule):
    15     type = 'title'
    16     #只工作一次,处理第一个快,因为处理完一次之后first的值被设置为了False,所以不会再执行处理方法了
    17     first = True
    18     def condition(self,block):
    19         if not self.first: return False
    20         self.first = False
    21         return HeadingRule.condition(self,block)
    22 
    23 class ListItemRule(Rule):
    24     type = 'listitem'
    25     def condition(self,block):
    26         return block[0] == '-'
    27     def action(self,block,handler):
    28         handler.start(self.type)
    29         handler.feed(block[1:].strip())
    30         handler.end(self.type)
    31         return True
    32 
    33 class ListRule(ListItemRule):
    34     type = 'list'
    35     inside = False
    36     def condition(self,block):
    37         return True
    38     def action(self,block,handler):
    39         if not self.inside and ListItemRule.condition(self,block):
    40             handler.start(self.type)
    41             self.inside = True
    42         elif self.inside and not ListItemRule.condition(self,block):
    43             handler.end(self.type)
    44             self.inside = False
    45         return False
    46 
    47 class ParagraphRule(Rule):
    48     type = 'paragraph'
    49     def condition(self,block):
    50         return True

    这部分还是比较好懂的,看书上就可以了。

    ④主程序(markup.py)

     1 import sys,re
     2 from handlers import *
     3 from util import *
     4 from rules import *
     5 
     6 class Parser:
     7     def __init__(self,handler):
     8         self.handler = handler
     9         self.rules = []
    10         self.filters = []
    11     #向规则列表中添加规则
    12     def addRule(self,rule):
    13         self.rules.append(rule)
    14     #向过滤器列表中添加过滤器
    15     def addFilter(self,pattern,name):
    16         #创建过滤器,实际上这里return的是一个替换式
    17         def filter(block,handler):
    18             return re.sub(pattern,handler.sub(name),block)
    19         self.filters.append(filter)
    20     #对文件进行处理
    21     def parse(self,file):
    22         self.handler.start('document')
    23         #对文件中的文本块依次执行过滤器和规则    
    24         for block in blocks(file):
    25             for filter in self.filters:
    26                 block = filter(block,self.handler)
    27             for rule in self.rules:
    28                 #判断文本块是否符合相应规则,若符合做执行规则对应的处理方法
    29                 if rule.condition(block):
    30                     last = rule.action(block,self.handler)
    31                     if last:break
    32         self.handler.end('document')
    33 
    34 class BasicTextParser(Parser):
    35     def __init__(self,handler):
    36         Parser.__init__(self,handler)
    37         self.addRule(ListRule())
    38         self.addRule(ListItemRule())
    39         self.addRule(TitleRule())
    40         self.addRule(HeadingRule())
    41         self.addRule(ParagraphRule())
    42 
    43         self.addFilter(r'*(.+?)*','emphasis')
    44         self.addFilter(r'(http://[.a-zA-Z/]+)','url')
    45         self.addFilter(r'([.a-zA-Z]+@[.a-zA-Z]+[a-zA-Z]+)','mail')
    46         
    47 handler = HTMLRenderer()
    48 parser = BasicTextParser(handler)
    49 
    50 parser.parse(sys.stdin)

    addFilter的作用是向过滤器列表中添加一个过滤器,首先是创建过滤器,handler.sub(name),会返回一个函数,替换后再加入列表中。

    最后运行结果如下:

  • 相关阅读:
    select移动选项
    jFinal+AngularJs未来javaEE开发的趋势——程序员的福音 .
    MVC框架PK:Angular、Backbone、CanJS与Ember
    错误 1093 You can't specify target table 'table name' for update in FROM clause
    Angularjs开发一些经验总结
    需求调研的步骤、方法
    MyEclipse如何跟踪调试
    需求入门: 软件需求的三个层次
    JAVA的Random类(转)
    Java中从[1,36]随机生成7个不重复的数字,放入一个数组中
  • 原文地址:https://www.cnblogs.com/zyb993963526/p/8287392.html
Copyright © 2011-2022 走看看