zoukankan      html  css  js  c++  java
  • python – 解析pcfg语法树 提取其语法规则 Probabilistic Context-Free Grammar Parser


    string = '''
                (NP (NN Carnac) (DT the) (NN Magnificent))
                (VP (VBD gave) (NP (DT a) (NN talk)))
    def is_symbol_char(character):
        Predicate to test if a character is valid
        for use in a symbol, extend as needed.
        return character.isalpha() or character in '-=$!?.'
    def tokenize(characters):
        Process characters into a nested structure.  The original string
        '(DT the)' is passed in as ['(', 'D', 'T', ' ', 't', 'h', 'e', ')']
        tokens = []
        while characters:
            character = characters.pop(0)
            if character.isspace():
                pass  # nothing to do, ignore it
            elif character == '(':  # signals start of recursive analysis (push)
                characters, result = tokenize(characters)
            elif character == ')':  # signals end of recursive analysis (pop)
            elif is_symbol_char(character):
                # if it looks like a symbol, collect all
                # subsequents symbol characters
                symbol = ''
                while is_symbol_char(character):
                    symbol += character
                    character = characters.pop(0)
                # push unused non-symbol character back onto characters
                characters.insert(0, character)
        # Return whatever tokens we collected and any characters left over
        return characters, tokens
    def extract_rules(tokens):
        ''' Recursively walk tokenized data extracting rules. '''
        head, *tail = tokens
        print(head, '-->', *[x[0] if isinstance(x, list) else x for x in tail])
        for token in tail:  # recurse
            if isinstance(token, list):
    characters, tokens = tokenize(list(string))
    # After a successful tokenization, all the characters should be consumed
    assert not characters, "Didn't consume all the input!"
    print('Tokens:', tokens[0], 'Rules:', sep='
    ', end='


  • 相关阅读:
    Linux 之 最常用的20条命令
    [转]sql语句中出现笛卡尔乘积 SQL查询入门篇
    mysql 多表连接
    postman 测试API
    [转]mysql 视图
    数据库 修改统一显示时间
  • 原文地址:https://www.cnblogs.com/cupleo/p/13908416.html
Copyright © 2011-2022 走看看