zoukankan      html  css  js  c++  java
  • 构建知识图谱-初学

    本文内容源自medium文章

    A Knowledge Graph understanding and implementation tutorial for beginners[1]

    什么是知识图谱?

    知识图谱的内容通常以三元组形式存在,Subject-Predicate-Object (spo)。

    举个栗子:

    Leonard Nimoy was an actor who played the character Spock in the science-fiction movie Star Trek

    对上面的句子可以抽取到如下三元组:

    image-20200518115208499

    以知识图谱形式可以表示为:

    image-20200518115310348

    上述由节点和关系组成的图,就是一个简单的知识图谱。

    如何搭建一个简单的知识图谱?

    可以分为以下两大步骤:

    • 知识提取
      • 信息抽取,获取三元组
      • 实体识别、实体链接、实体消歧(Disambiguation)、实体统一(Entity Resolution)
    • 图构建
      • 存储
      • 查询

    知识提取步骤是构建知识图谱的关键,三元组可以通过依存分析得到。

    动手构建一个简单知识图谱

    此处只显示代码执行过程与结果,完整代码请见github.

    1. 三元组提取

    借助spacy

    inputText = 'Startup companies create jobs and innovation. Bill Gates supports entrepreneurship.'
    
    # Step 1: Knowledge Extraction. Output: SOP triples
    knowledgeExtractionObj = KnowledgeExtraction()
    sop_list = knowledgeExtractionObj.retrieveKnowledge(inputText)
    #list_sop = sop_list.as_doc()
    sop_list_strings = []
    for sop in sop_list:
        temp = []
        temp.append(sop[0].text)
        temp.append(sop[1].text)
        temp.append(sop[2].text)
        sop_list_strings.append(temp)
    
    print(sop_list_strings)
    

    结果

    image-20200518121130941

    2. 实体链接

    # Step 2: Entity recognition and linking. This step needs to be linked.
    entityRecognitionLinkingObj = EntityRecognitionLinking()
    entityRelJson = entityRecognitionLinkingObj.entityRecogLink(inputText)
    
    entityLinkTriples = []
    for sop in sop_list_strings:
        tempTriple = ['', '', '']
        for resource in entityRelJson['Resources']:
            if resource['@surfaceForm'] == sop[0]:
                tempTriple[0] = resource['@URI']
            if resource['@surfaceForm'] == sop[1]:
                tempTriple[1] = resource['@URI']
            if resource['@surfaceForm'] == sop[2]:
                tempTriple[2] = resource['@URI']
        entityLinkTriples.append(tempTriple)
    print(entityLinkTriples)
    
    

    结果

    image-20200518121205037

    3. 图构建

    使用neo4j

    # Step 3: Knowledge Graph creation.
    graphPopulationObj = GraphPopulation()
    graphPopulationObj = graphPopulationObj.popGraph(
        sop_list_strings, entityLinkTriples)
    

    image-20200518121223303

    最终得到图如下:

    image-20200518121314890

    可能遇到的问题

    • Q1
    AuthError: The client is unauthorized due to authentication failure.
    

    解决办法:

    确保图数据库配置时密码一致与设置的一致 (以下配置表示,user:neo4j,password:neo4j)

    config.DATABASE_URL = 'bolt://neo4j:neo4j@localhost:7687'#default
    
    • Q2
    ServiceUnavailable: Failed to establish connection to ('127.0.0.1', 7687) (reason [WinError 10061] 由于目标计算机积极拒绝,无法连接。)
    

    解决办法:

    确保在执行图创建代码前已经打开neo4j

    有问题欢迎留言,一起交流

    [1]https://medium.com/analytics-vidhya/a-knowledge-graph-implementation-tutorial-for-beginners-3c53e8802377

    [2]https://github.com/kramankishore/Knowledge-Graph-Intro

    [3]https://neomodel.readthedocs.io/en/latest/getting_started.html#connecting

    [4]https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/

  • 相关阅读:
    python网络编程--线程GIL(全局解释器锁)
    python网络编程--进程线程
    html之块级标签h系列,div
    html之head,base,meta,title
    可视化SNV安装
    MySQLdb模块的安装
    python之os模块
    python之时间函数
    python之路之正则表达式
    python之路 之open
  • 原文地址:https://www.cnblogs.com/gongyanzh/p/12909845.html
Copyright © 2011-2022 走看看