zoukankan      html  css  js  c++  java
  • 各大厂的语音识别Speech To Text API使用体验

    最近发现有声读物能极大促进我的睡眠,但每个前面都有一段开场语,想把它剪掉,但是有多个开场语,所以就要用到语音识别判断一下再剪。

    前两年在本地搭建过识别的环境,奈何识别准确率不行,只能找找API了,后面有时间再弄本地的吧。下面是几个大厂提供的服务,就我个人使用来看,讯飞 > Google > IBM,
    但在中文识别准确度上,讯飞是最强的。

    Oracle:

    被它的Always Free计划吸了一波粉,但是提供的转写服务不支持中文,pass

    IBM

    优点:有一定的持续免费额度
    缺点:准确度不够,官网访问有点慢
    乱写的示例:

    #coding:utf-8
    '''
    @version: python3.8
    @author: ‘eric‘
    @license: Apache Licence
    @contact: steinven@qq.com
    @software: PyCharm
    @file: ibm.py
    @time: 2021/6/16 23:05
    '''
    from __future__ import print_function
    
    import traceback
    
    apikey = ''
    url = ''
    
    from watson_developer_cloud import SpeechToTextV1
    service = SpeechToTextV1(
        iam_apikey=apikey,
        url=url)
    
    import os, re
    
    #总资源文件目录
    base_dir = r'36041981'
    
    #子目录,存放已被裁剪好的长度为5s的x2m后缀文件(安卓端,喜马拉雅缓存文件),我估计其实就是常用的音频格式,就改了个后缀名
    cliped_dir =os.listdir(os.path.join(base_dir,'clip'))
    for each in cliped_dir:
        try:
            filename = re.findall(r"(.*?).x2m", each)  # 取出.mp3后缀的文件名
            if filename:
                filename[0] += '.x2m'
                with open(os.path.join(base_dir, 'clip', filename[0]),
                          'rb') as audio_file:
                    recognize_result = service.recognize(
                        audio=audio_file,
                        content_type='audio/mp3',
                        timestamps=False,
                        #中文模型,CN_BroadbandModel更准确一点
                        model='zh-CN_NarrowbandModel',
                        # model='zh-CN_BroadbandModel',
                        
                        
                        #这两个参数应该是让识别出来的文字更接近于提供的,但实际测试,并没什么用,不知道什么原因
                        # keywords=list(set([x for x in '曲曲于山川历史为解之谜拓展人生的长度广度人生的长度广度和深度由喜马拉雅联合大理石独家推出探秘类大家好欢迎大家订阅历史未解之谜全记录'])),
                        #keywords_threshold=0.1,
                        word_confidence=True).get_result()
                    if len(recognize_result['results'])==0:
                        with open('result-1.txt', 'a', encoding='utf-8') as f:
                            f.write('%s-%s
    ' % (filename[0], '-'))
                            continue
                    final_result = recognize_result['results'][0]['alternatives'][0]['transcript'].replace(' ', '')
                    with open('result-1.txt', 'a',encoding='utf-8') as f:
                        f.write('%s-%s
    ' % (filename[0], final_result))
        except:
            traceback.print_exc()
            print(each)
    
    

    Google

    优点:识别速度快
    缺点:要挂代__理访问,需付费
    文档:快速入门:使用客户端库,本地音频文件的话,不要用文档中的代码,可参考我下面的
    乱写的示例:

    # coding:utf-8
    from os import path
    
    AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "268675557.mp3")
    
    
    def transcribe_file(speech_file):
        """Transcribe the given audio file."""
        from google.cloud import speech
        import io
    
        client = speech.SpeechClient()
    
        with io.open(speech_file, "rb") as audio_file:
            content = audio_file.read()
    
        audio = speech.RecognitionAudio(content=content)
        config = speech.RecognitionConfig(
            encoding=speech.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED,
            sample_rate_hertz=16000,
            language_code="zh-CN",
        )
    
        response = client.recognize(config=config, audio=audio)
    
        # Each result is for a consecutive portion of the audio. Iterate through
        # them to get the transcripts for the entire audio file.
        for result in response.results:
            # The first alternative is the most likely one for this portion.
            print(u"Transcript: {}".format(result.alternatives[0].transcript))
    
    
    if __name__ == '__main__':
        transcribe_file(AUDIO_FILE)
    
    

    讯飞

    优点:有限期的免费额度,识别速度快,中文识别最为准确,国内厂商,开发者上手很容易
    缺点:识别速度慢,收费,还挺贵
    代码就不贴了,官网很容易找到demo

  • 相关阅读:
    致命错误 RC1004: 文件查找结束时有无法预知的错误(vc++)
    demo713总结
    图标,鼠标,字符串,音频..
    不同的色深条件(8、16、24、32),像素绘制方式
    SQL 保留两位小数的实现方式
    MVC4的REmote缺陷
    MVC4安装过程
    mongodb 的几种驱动
    iis7 web配置问题及解决办法
    Fast Binary File Reading with C#
  • 原文地址:https://www.cnblogs.com/steinven/p/14894622.html
Copyright © 2011-2022 走看看