1.Word(token)
用于匹配由允许的字符集组成的单词,常见的错误是使用特定字符串Word("expr")匹配"expr"
- L {alphas} 字母
- L {nums} 数字
- L {alphanums} 数字字母混合
2.Suppress
忽略表达式中内容
import pyparsing as pp source = "a , b, c, d" wd = pp.Word(pp.alphas) wd_list = wd + pp.ZeroOrMore(','+ wd) print wd_list.parseString(source) # result ['a', ',', 'b', ',', 'c', ',', 'd'] # ZeroOrMore wd_list = wd +pp.ZeroOrMore(pp.Suppress(',')+wd) print wd_list.parseString(source) # ['a', 'b', 'c', 'd']
3. Group
使用group将返回的结果,使匹配的合成一个字符串
from pyparsing import * wd = Word(alphas) comma = Literal(",") greetee = OneOrMore(wd) end = oneOf("! ?") greeting = wd + comma + greetee + end # result::['Hello', ',', 'World', '!'] print greeting.parseString("Hello,World!")
wd = Group(Word(alphas)) comma = Literal(",") greetee = OneOrMore(wd) end = oneOf("! ?") greeting = wd + comma + greetee + end # [['Hello'], ',', ['World'], '!'] print greeting.parseString("Hello,World!")
from pyparsing import * wd = Word(alphas) comma = Literal(",").suppress() greetee = OneOrMore(wd) end = oneOf("! ?").suppress() greeting = wd + comma + greetee + end #['Hello', 'World'] print greeting.parseString("Hello,World!")
4. setResultsName 给每一个token匹配起一个漂亮的名字
给匹配的token起一个名字,方便在解析后的ParseResults对象中像字典一样调用
from pyparsing import * integer = Word(nums) date_str = (integer("year")+'/'+integer("month")+'/'+integer("day")) # integer("year") equivalent to interger.setResultsName("year") data = date_str.parseString('2019/04/17') # year,type:<type 'str'>,value:2019 print('year,type:%s,value:%s' %(type(data.year),data.year))
5 setParseAction 对每个解析的token进行处理
处理的方法可以自定义,其中三个参数见下
- s = the original string being parsed (see note below) # 原字符串
- loc = the location of the matching substring # 匹配的token所处位置
- toks = a list of the matched tokens # 匹配的token列表
比如想对上例中的日期转为int数字,可以自定义一个parseAction,如下:
from pyparsing import * integer = Word(nums).setParseAction(lambda s,lo,tokens:int(tokens[0])) date_str = (integer("year")+'/'+integer("month")+'/'+integer("day")) data = date_str.parseString('2019/04/17') # year,type:<type 'int'>,value:2019 print('year,type:%s,value:%s' %(type(data.year),data.year))
6. parseString 解析传入的字符串
str:第一个参数传入需要解析字符串
parseAll: 第二个参数是否为完全匹配解析。1.解析配置的模式必须与字符串一致,否则会报错。2.匹配的tokens放置在tokens列表中,在上面定义parseAction时,使用tokens[0],因为token中只有一个匹配的token,但在此种模式下,tokens中可能存在多个token
7. delimitedList 只需要传入一个匹配格式,就可以 Word,Word....若干个匹配,默认每个Word使用逗号断开
om pyparsing import Word, alphas, alphanums, Combine, oneOf, Optional, delimitedList, Group, Keyword testdata = """ int func1(float *vec, int len, double arg1); int func2(float **arr, float *vec, int len, double arg1, double arg2); """ # function retun type is alphas and function name is number,alphas and _ ident = Word(alphas, alphanums + "_") # define var: var type and before var name *. vartype = Combine( oneOf("float double int char") + Optional(Word("*")), adjacent = False) # return type and var name or * var name arglist = delimitedList(Group(vartype("type") + ident("name"))) functionCall = Keyword("int") + ident("name") + "(" + arglist("args") + ")" + ";" for fn,s,e in functionCall.scanString(testdata): print(fn.name) for a in fn.args: print(" - %(name)s (%(type)s)" % a) # output: # func1 # - vec (float*) # - len (int) # - arg1 (double) # func2 # - arr (float**) # - vec (float*) # - len (int) # - arg1 (double) # - arg2 (double)