zoukankan      html  css  js  c++  java
  • 【自然语言处理篇】--Chatterbot聊天机器人

    一、前述

    ChatterBot是一个基于机器学习的聊天机器人引擎,构建在python上,主要特点是可以自可以从已有的对话中进行学(jiyi)习(pipei)。

    二、具体

    1、安装

    是的,安装超级简单,用pip就可以啦

    pip install chatterbot

    2、流程

    大家已经知道chatterbot的聊天逻辑和输入输出以及存储,是由各种adapter来限定的,我们先看看流程图,一会软再一起看点例子,看看怎么用。

     

     

    3、每个部分都设计了不同的“适配器”(Adapter)。

    机器人应答逻辑 => Logic Adapters
    Closest Match Adapter  字符串模糊匹配(编辑距离)

    Closest Meaning Adapter  借助nltk的WordNet,近义词评估
    Time Logic Adapter 处理涉及时间的提问
    Mathematical Evaluation Adapter 涉及数学运算

    存储器后端 => Storage Adapters
     Read Only Mode 只读模式,当有输入数据到chatterbot的时候,数
    据库并不会发生改变
     Json Database Adapter 用以存储对话数据的接口,对话数据以Json格式
    进行存储。
    Mongo Database Adapter  以MongoDB database方式来存储对话数据

    输入形式 => Input Adapters

    Variable input type adapter 允许chatter bot接收不同类型的输入的,如strings,dictionaries和Statements
    Terminal adapter 使得ChatterBot可以通过终端进行对话
     HipChat Adapter 使得ChatterBot 可以从HipChat聊天室获取输入语句,通过HipChat 和 ChatterBot 进行对话
    Speech recognition 语音识别输入,详见chatterbot-voice

    输出形式 => Output Adapters
    Output format adapter支持text,json和object格式的输出
    Terminal adapter
    HipChat Adapter
    Mailgun adapter允许chat bot基于Mailgun API进行邮件的发送
    Speech synthesisTTS(Text to speech)部分,详见chatterbot-voice

    4、代码

    基础版本

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    
    
    # 构建ChatBot并指定Adapter
    bot = ChatBot(
        'Default Response Example Bot',
        storage_adapter='chatterbot.storage.JsonFileStorageAdapter',#存储的Adapter
        logic_adapters=[
            {
                'import_path': 'chatterbot.logic.BestMatch'#回话逻辑
            },
            {
                'import_path': 'chatterbot.logic.LowConfidenceAdapter',#回话逻辑
                'threshold': 0.65,#低于置信度,则默认回答
                'default_response': 'I am sorry, but I do not understand.'
            }
        ],
        trainer='chatterbot.trainers.ListTrainer'#给定的语料是个列表
    )
    
    # 手动给定一点语料用于训练
    bot.train([
        'How can I help you?',
        'I want to create a chat bot',
        'Have you read the documentation?',
        'No, I have not',
        'This should help get you started: http://chatterbot.rtfd.org/en/latest/quickstart.html'
    ])
    
    # 给定问题并取回结果
    question = 'How do I make an omelette?'
    print(question)
    response = bot.get_response(question)
    print(response)
    
    print("
    ")
    question = 'how to make a chat bot?'
    print(question)
    response = bot.get_response(question)
    print(response)

     

    结果:

    How do I make an omelette?
    I am sorry, but I do not understand.
    
    
    how to make a chat bot?
    Have you read the documentation?

     

    处理时间和数学计算的Adapter

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    
    
    bot = ChatBot(
        "Math & Time Bot",
        logic_adapters=[
            "chatterbot.logic.MathematicalEvaluation",
            "chatterbot.logic.TimeLogicAdapter"
        ],
        input_adapter="chatterbot.input.VariableInputTypeAdapter",
        output_adapter="chatterbot.output.OutputAdapter"
    )
    
    # 进行数学计算
    question = "What is 4 + 9?"
    print(question)
    response = bot.get_response(question)
    print(response)
    
    print("
    ")
    
    # 回答和时间相关的问题
    question = "What time is it?"
    print(question)
    response = bot.get_response(question)
    print(response)

     

     结果:

    What is 4 + 9?
    ( 4 + 9 ) = 13
    
    What time is it?
    The current time is 05:08 PM

     导出语料到json文件

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    
    '''
    如果一个已经训练好的chatbot,你想取出它的语料,用于别的chatbot构建,可以这么做
    '''
    
    chatbot = ChatBot(
        'Export Example Bot',
        trainer='chatterbot.trainers.ChatterBotCorpusTrainer'
    )
    
    # 训练一下咯
    chatbot.train('chatterbot.corpus.english')
    
    # 把语料导出到json文件中
    chatbot.trainer.export_for_training('./my_export.json')

    反馈式学习聊天机器人

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    import logging
    
    """
    反馈式的聊天机器人,会根据你的反馈进行学习
    """
    
    # 把下面这行前的注释去掉,可以把一些信息写入日志中
    # logging.basicConfig(level=logging.INFO)
    
    # 创建一个聊天机器人
    bot = ChatBot(
        'Feedback Learning Bot',
        storage_adapter='chatterbot.storage.JsonFileStorageAdapter',
        logic_adapters=[
            'chatterbot.logic.BestMatch'
        ],
        input_adapter='chatterbot.input.TerminalAdapter',#命令行端
        output_adapter='chatterbot.output.TerminalAdapter'
    )
    
    DEFAULT_SESSION_ID = bot.default_session.id
    
    
    def get_feedback():
        from chatterbot.utils import input_function
    
        text = input_function()
    
        if 'Yes' in text:
            return True
        elif 'No' in text:
            return False
        else:
            print('Please type either "Yes" or "No"')
            return get_feedback()
    
    
    print('Type something to begin...')
    
    # 每次用户有输入内容,这个循环就会开始执行
    while True:
        try:
            input_statement = bot.input.process_input_statement()
            statement, response = bot.generate_response(input_statement, DEFAULT_SESSION_ID)
    
            print('
     Is "{}" this a coherent response to "{}"? 
    '.format(response, input_statement))
    
            if get_feedback():
                bot.learn_response(response,input_statement)
    
            bot.output.process_response(response)
    
            # 更新chatbot的历史聊天数据
            bot.conversation_sessions.update(
                bot.default_session.id_string,
                (statement, response, )
            )
    
        # 直到按ctrl-c 或者 ctrl-d 才会退出
        except (KeyboardInterrupt, EOFError, SystemExit):
            break

     使用Ubuntu数据集构建聊天机器人

    from chatterbot import ChatBot
    import logging
    
    
    '''
    这是一个使用Ubuntu语料构建聊天机器人的例子
    '''
    
    # 允许打日志
    logging.basicConfig(level=logging.INFO)
    
    chatbot = ChatBot(
        'Example Bot',
        trainer='chatterbot.trainers.UbuntuCorpusTrainer'
    )
    
    # 使用Ubuntu数据集开始训练
    chatbot.train()
    
    # 我们来看看训练后的机器人的应答
    response = chatbot.get_response('How are you doing today?')
    print(response)

    借助微软的聊天机器人

     

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    from settings import Microsoft
    
    '''
    关于获取微软的user access token请参考以下的文档
    https://docs.botframework.com/en-us/restapi/directline/
    '''
    
    chatbot = ChatBot(
        'MicrosoftBot',
        directline_host = Microsoft['directline_host'],
        direct_line_token_or_secret = Microsoft['direct_line_token_or_secret'],
        conversation_id = Microsoft['conversation_id'],
        input_adapter='chatterbot.input.Microsoft',
        output_adapter='chatterbot.output.Microsoft',
        trainer='chatterbot.trainers.ChatterBotCorpusTrainer'
    )
    
    chatbot.train('chatterbot.corpus.english')
    
    # 是的,会一直聊下去
    while True:
        try:
            response = chatbot.get_response(None)
    
        # 直到按ctrl-c 或者 ctrl-d 才会退出
        except (KeyboardInterrupt, EOFError, SystemExit):
            break

    HipChat聊天室Adapter

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    from settings import HIPCHAT
    
    '''
    炫酷一点,你可以接到一个HipChat聊天室,你需要一个user token,下面文档会告诉你怎么做
    https://developer.atlassian.com/hipchat/guide/hipchat-rest-api/api-access-tokens
    '''
    
    chatbot = ChatBot(
        'HipChatBot',
        hipchat_host=HIPCHAT['HOST'],
        hipchat_room=HIPCHAT['ROOM'],
        hipchat_access_token=HIPCHAT['ACCESS_TOKEN'],
        input_adapter='chatterbot.input.HipChat',
        output_adapter='chatterbot.output.HipChat',
        trainer='chatterbot.trainers.ChatterBotCorpusTrainer'
    )
    
    chatbot.train('chatterbot.corpus.english')
    
    # 没错,while True,会一直聊下去!
    while True:
        try:
            response = chatbot.get_response(None)
    
        # 直到按ctrl-c 或者 ctrl-d 才会退出
        except (KeyboardInterrupt, EOFError, SystemExit):
            break

    邮件回复的聊天系统

    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    from settings import MAILGUN
    
    '''
    这个功能需要你新建一个文件settings.py,并在里面写入如下的配置:
    MAILGUN = {
        "CONSUMER_KEY": "my-mailgun-api-key",
        "API_ENDPOINT": "https://api.mailgun.net/v3/my-domain.com/messages"
    }
    '''
    
    # 下面这个部分可以改成你自己的邮箱
    FROM_EMAIL = "mailgun@salvius.org"
    RECIPIENTS = ["gunthercx@gmail.com"]
    
    bot = ChatBot(
        "Mailgun Example Bot",
        mailgun_from_address=FROM_EMAIL,
        mailgun_api_key=MAILGUN["CONSUMER_KEY"],
        mailgun_api_endpoint=MAILGUN["API_ENDPOINT"],
        mailgun_recipients=RECIPIENTS,
        input_adapter="chatterbot.input.Mailgun",
        output_adapter="chatterbot.output.Mailgun",
        storage_adapter="chatterbot.storage.JsonFileStorageAdapter",
        database="../database.db"
    )
    
    # 简单的邮件回复
    response = bot.get_response("How are you?")
    print("Check your inbox at ", RECIPIENTS)

    一个中文的例子

    注意chatterbot,中文聊天机器人的场景下一定要用python3.X,用python2.7会有编码问题。

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    #手动设置一些语料
    from chatterbot import ChatBot
    from chatterbot.trainers import ListTrainer
     
     
    Chinese_bot = ChatBot("Training demo")
    Chinese_bot.set_trainer(ListTrainer)
    Chinese_bot.train([
        '你好',
        '你好',
        '有什么能帮你的?',
        '想买数据科学的课程',
        '具体是数据科学哪块呢?'
        '机器学习',
    ])
     
    # 测试一下
    question = '你好'
    print(question)
    response = Chinese_bot.get_response(question)
    print(response)
    
    print("
    ")
    
    question = '请问哪里能买数据科学的课程'
    print(question)
    response = Chinese_bot.get_response(question)
    print(response)

    结果:

    你好
    你好
    
    
    请问哪里能买数据科学的课程
    具体是数据科学哪块呢?

    利用已经提供好的小中文语料库

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    from chatterbot import ChatBot
    from chatterbot.trainers import ChatterBotCorpusTrainer
     
    chatbot = ChatBot("ChineseChatBot")
    chatbot.set_trainer(ChatterBotCorpusTrainer)
     
    # 使用中文语料库训练它
    chatbot.train("chatterbot.corpus.chinese")
     
    # 开始对话
    while True:
        print(chatbot.get_response(input(">")))

     

  • 相关阅读:
    LeetCode 88. Merge Sorted Array
    LeetCode 75. Sort Colors
    LeetCode 581. Shortest Unsorted Continuous Subarray
    LeetCode 20. Valid Parentheses
    LeetCode 53. Maximum Subarray
    LeetCode 461. Hamming Distance
    LeetCode 448. Find All Numbers Disappeared in an Array
    LeetCode 976. Largest Perimeter Triangle
    LeetCode 1295. Find Numbers with Even Number of Digits
    如何自学并且系统学习计算机网络?(知乎问答)
  • 原文地址:https://www.cnblogs.com/LHWorldBlog/p/9292024.html
Copyright © 2011-2022 走看看