zoukankan      html  css  js  c++  java
  • 【自然语言处理篇】--聊天机器人从初始到应用

    一、前述

    维基百科中的机器人是指主要用于协助编者执行大量自动化、高速或机械式、繁琐的编辑工作的计算机程序或脚本及其所登录的帐户。

    二、具体

    1、最简单的就是基于Rule-Base的聊天机器人。

    也就是计算设计好语料库的问答语句。 就是小学生级别的 问什么 答什么

    import random
    
    # 打招呼
    greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
    # 回复打招呼
    random_greeting = random.choice(greetings)
    
    # 对于“你怎么样?”这个问题的回复
    question = ['How are you?','How are you doing?']
    # “我很好”
    responses = ['Okay',"I'm fine"]
    # 随机选一个回
    random_response = random.choice(responses)
    
    # 机器人跑起来
    while True:
        userInput = input(">>> ")
        if userInput in greetings:
            print(random_greeting)
        elif userInput in question:
            print(random_response)
        # 除非你说“拜拜”
        elif userInput == 'bye':
            break
        else:
            print("I did not understand what you said")

     结果:

    >>> hi
    hey
    >>> how are u
    I did not understand what you said
    >>> how are you
    I did not understand what you said
    >>> how are you?
    I did not understand what you said
    >>> How are you?
    I'm fine
    >>> bye

    2、升级I:

    显然 这样的rule太弱智了,我们需要更好一点的“精准对答”,比如 透过关键词来判断这句话的意图是什么(intents)。

    from nltk import word_tokenize
    import random
    
    # 打招呼
    greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
    # 回复打招呼
    random_greeting = random.choice(greetings)
    
    # 对于“假期”的话题关键词
    question = ['break','holiday','vacation','weekend']
    # 回复假期话题
    responses = ['It was nice! I went to Paris',"Sadly, I just stayed at home"]
    # 随机选一个回
    random_response = random.choice(responses)
    
    
    
    # 机器人跑起来
    while True:
        userInput = input(">>> ")
        # 清理一下输入,看看都有哪些词
        cleaned_input = word_tokenize(userInput)
        # 这里,我们比较一下关键词,确定他属于哪个问题
        if  not set(cleaned_input).isdisjoint(greetings):
            print(random_greeting)
        elif not set(cleaned_input).isdisjoint(question):
            print(random_response)
        # 除非你说“拜拜”
        elif userInput == 'bye':
            break
        else:
            print("I did not understand what you said")
    >>> hi
    hey
    >>> how was your holiday?
    It was nice! I went to Paris
    >>> wow, amazing!
    I did not understand what you said
    >>> bye

    大家大概能发现,这依旧是文字层面的“精准对应”。现在主流的研究方向,是做到语义层面的对应。比如,“肚子好饿哦”, “饭点到了”,应该表示的是要吃饭了的意思。在这个层面,就需要用到word vector之类的embedding方法,这部分内容 日后的课上会涉及到。

    3、升级II:

    光是会BB还是不行,得有知识体系!才能解决用户的问题。我们可以用各种数据库,建立起一套体系,然后通过搜索的方式,来查找答案。比如,最简单的就是Python自己的graph数据结构来搭建一个“地图”。依据这个地图,我们可以清楚的找寻从一个地方到另一个地方的路径,然后作为回答,反馈给用户。

    # 建立一个基于目标行业的database
    # 比如 这里我们用python自带的graph
    graph = {'上海': ['苏州', '常州'],
             '苏州': ['常州', '镇江'],
             '常州': ['镇江'],
             '镇江': ['常州'],
             '盐城': ['南通'],
             '南通': ['常州']}
    
    # 明确如何找到从A到B的路径
    def find_path(start, end, path=[]):
        path = path + [start]
        if start == end:
            return path
        if start not in graph:
            return None
        for node in graph[start]:
            if node not in path:
                newpath = find_path(node, end, path)
                if newpath: return newpath
        return None
    print(find_path('上海', "镇江"))
    ['上海', '苏州', '常州', '镇江']
    

    同样的构建知识图谱的玩法,也可以使用一些Logic Programming,比如上个世纪学AI的同学都会学的Prolog。或者比如,python版本的prolog:PyKE。他们可以构建一种复杂的逻辑网络,让你方便提取信息,而不至于需要你亲手code所有的信息:

    son_of(bruce, thomas, norma)
    son_of(fred_a, thomas, norma)
    son_of(tim, thomas, norma)
    daughter_of(vicki, thomas, norma)
    daughter_of(jill, thomas, norma)

    4、升级III:

    任何行业,都分个前端后端。AI也不例外。我们这里讲的算法,都是后端跑的。那么, 为了做一个靠谱的前端,很多项目往往也需要一个简单易用,靠谱的前端。比如,这里,利用Google的API,写一个类似钢铁侠Tony的语音小秘书Jarvis:我们先来看一个最简单的说话版本。利用gTTs(Google Text-to-Speech API), 把文本转化为音频。

    from gtts import gTTS
    import os
    tts = gTTS(text='您好,我是您的私人助手,我叫小辣椒', lang='zh-tw')
    tts.save("hello.mp3")
    os.system("mpg321 hello.mp3")

    同理,有了文本到语音的功能,我们还可以运用Google API读出Jarvis的回复:

    (注意:这里需要你的机器安装几个库 SpeechRecognition, PyAudio 和 PySpeech)

     
    import speech_recognition as sr
    from time import ctime
    import time
    import os
    from gtts import gTTS
    import sys
     
    # 讲出来AI的话
    def speak(audioString):
        print(audioString)
        tts = gTTS(text=audioString, lang='en')
        tts.save("audio.mp3")
        os.system("mpg321 audio.mp3")
    
    # 录下来你讲的话
    def recordAudio():
        # 用麦克风记录下你的话
        r = sr.Recognizer()
        with sr.Microphone() as source:
            audio = r.listen(source)
     
        # 用Google API转化音频
        data = ""
        try:
            data = r.recognize_google(audio)
            print("You said: " + data)
        except sr.UnknownValueError:
            print("Google Speech Recognition could not understand audio")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service; {0}".format(e))
     
        return data
    
    # 自带的对话技能(rules)
    def jarvis():
        
        while True:
            
            data = recordAudio()
    
            if "how are you" in data:
                speak("I am fine")
    
            if "what time is it" in data:
                speak(ctime())
    
            if "where is" in data:
                data = data.split(" ")
                location = data[2]
                speak("Hold on Tony, I will show you where " + location + " is.")
                os.system("open -a Safari https://www.google.com/maps/place/" + location + "/&")
    
            if "bye" in data:
                speak("bye bye")
                break
    
    # 初始化
    time.sleep(2)
    speak("Hi Tony, what can I do for you?")
    
    # 跑起
    jarvis()
    Hi Tony, what can I do for you?
    You said: how are you
    I am fine
    You said: what time is it now
    Fri Apr  7 18:16:54 2017
    You said: where is London
    Hold on Tony, I will show you where London is.
    You said: ok bye bye
    bye bye

    不仅仅是语音前端。包括应用场景:微信,slack,Facebook Messager,等等 都可以把我们的ChatBot给integrate进去。

  • 相关阅读:
    python 合并 Excel 单元格
    python 设置 Excel 表格的行高和列宽
    Python 用 openpyxl 模块统计 Excel 表格中的数据,以字典形式写入 py 文件
    python 打印字母阶梯和金字塔
    python 用 openpyxl 读取 Excel 表格中指定的行或列
    Python 的 filter() 函数
    Python 的 map() 函数
    python 之 range() 函数
    python 的 reduce() 函数
    python 之 lambda 函数
  • 原文地址:https://www.cnblogs.com/LHWorldBlog/p/9278918.html
Copyright © 2011-2022 走看看