zoukankan      html  css  js  c++  java
  • 命名实体识别,使用pyltp提取文本中的地址

    首先安装pyltp

    pytlp项目首页

    单例类(第一次调用时加载模型)

    class Singleton(object):
        def __new__(cls, *args, **kwargs):
            if not hasattr(cls, '_the_instance'):
                cls._the_instance = object.__new__(cls, *args, **kwargs)
            return cls._the_instance
    

    使用pyltp提取地址

    import os
    from pyltp import Segmentor, Postagger, NamedEntityRecognizer
    from main.models.Singleton import Singleton
    
    class address_extract_model(Singleton):
        print('load ltp model start...')
    
        pwd = os.getcwd()
        project_path = os.path.abspath(os.path.dirname(pwd) + os.path.sep + ".")
    
        LTP_DATA_DIR = project_path + 'AlarmClassificationmainltpmodel'  # ltp模型目录的路径
        cws_model_path = os.path.join(LTP_DATA_DIR, 'cws.model')
        pos_model_path = os.path.join(LTP_DATA_DIR, 'pos.model')  # 词性标注模型路径,模型名称为`pos.model`
        ner_model_path = os.path.join(LTP_DATA_DIR, 'ner.model')  # 命名实体识别模型路径,模型名称为`ner.model`
    
        print('path' + cws_model_path)
    
        segmentor = Segmentor()  # 初始化实例
        segmentor.load(cws_model_path)  # 加载模型
    
        postagger = Postagger() # 初始化实例
        postagger.load(pos_model_path)  # 加载模型
    
        recognizer = NamedEntityRecognizer() # 初始化实例
        recognizer.load(ner_model_path)  # 加载模型
    
    
        def get_model(self):
            return self.segmentor, self.postagger, self.recognizer
    
    
    def get_address_prediction(alarm_content):
        model = address_extract_model()
        segmentor, postagger, recognizer = model.get_model()
    
        words = segmentor.segment(alarm_content)  # 分词
        postags = postagger.postag(words)  # 词性标注
        netags = recognizer.recognize(words, postags)  # 命名实体识别
    
        result = ''
        for i in range(0, len(netags)):
            print(words[i] + ': ' + netags[i])
            # 地名标签为 ns
            if 's' in netags[i]:
                result += words[i] + ','
        if len(result) < 1:
            result = 'No address!'
        print(result)
        return result
    
    
    def get_address(alarm_content):
        print("start get_address...")
        result = "Exception"
        try:
            result = get_address_prediction(alarm_content)
        except Exception as ex:
            print(ex)
    
        print("Output is " + result)
        return result
    
    # segmentor.release()  # 释放模型
    # postagger.release()
    # recognizer.release()
    

    运行效果

    项目源码 ( 命名实体提取代码位于main/ltp, 模型文件需要到pyltp下载 )

    https://github.com/haibincoder/AlarmClassification

  • 相关阅读:
    用户交互
    python简介
    maven阿里云镜像setting
    apache虚拟主机的ip用法 包括iis
    apache的虚拟主机配置和指定目录的访问描述(
    apache重定向301 配置,根域名转到www
    前端学习
    一步一步写jQuery插件
    json 和 table控件
    下载相关
  • 原文地址:https://www.cnblogs.com/bincoding/p/9180553.html
Copyright © 2011-2022 走看看