帮助与文档 > 产品文档 > 语音识别ASR > API文档 > 实时语音识别服务
实时语音识别服务

实时语音简介

Hi,您好,欢迎使用有道智云实时语音识别API接口服务。

本文档主要针对需要集成API的技术开发工程师,详细描述实时语音识别能力相关的技术内容。

如果您有与我们商务合作的需求,可以通过一下方式联系我们:

商务邮箱: AIcloud_Business@corp.youdao.com

如果您对文档内容有任何疑问,可以通过以下几种方式联系我们:

客服QQ:1906538062

官方交流群:652880659

联系邮箱: zyservice@corp.youdao.com

下面是接口接入文档,接入测试需要应用ID和密钥,如果您还没有,请按照新手指南获取。

接口说明

接口地址:

wss://openapi.youdao.com/stream_asropenapi?{请求参数}

服务接口的调用分为认证、实时通信两阶段。

1 认证阶段

请求参数格式:
key1=value1&key2=value2&key3=value3&key4=value4

参数说明:

参数类型必填说明示例
appKeyString已申请的应用IDID
saltString随机数uuid
curtimeString时间戳,1970-1-01 00:00:00 至当前的秒数1522292849
signString加密数字签名。
signTypeString数字签名类型v4
langTypeString语言选择,接口目前支持中文和英文zh-CHS
formatString音频格式,支持wavwav
channelString声道,支持11
versionStringapi版本v1
rateString采样率16000

sign的生成规则:sha256(appKey + salt + curtime +appSecret),其中appSecret为应用密钥,可在应用管理-我的应用页,点击应用名称到应用详情页查看(仅API接入方式的应用有应用密钥)。

服务端通过 text message 返回 json字符串的认证结果,参数示例:

成功:

{
    "result": [],
    "action": "started",
    "errorCode": "0"
}

失败:

{
    "result": "[]",
    "action": "error",
    "errorCode": "202"
}

参数说明:

参数类型说明
errorCodeString状态码,详见状态码说明
actionString状态标识,started:握手,recognition:识别,error:错误
resultString识别结果数据

2 实时通信阶段

认证成功之后,进入实时通信阶段,此阶段客户端发送音频流和结束标识,并接收转写结果或错误。

2.1 发送音频流

此阶段客户端通过 binary message 发送音频流,内容为音频的二进制数据,此过程的发送频率将影响文字结果展示的实时性。

建议以 200ms 间隔发送音频数据,若间隔超时 15s 以上,服务端将停止识别。

2.2 发送结束标识

客户端完成所有音频数据的发送后,需发送一个特殊的 binary message 到服务端作为音频流
发送结束的标识,内容为:

{"end": "true"}

2.3 接收转写结果

交互过程中,服务端不断通过 text message 返回实时识别结果到客户端,响应结果是以json形式输出(为text message)。
识别结果示例:

{
    "result": [{
        "st": [{
            "bg": 30,
            "ed": 480,
            "ws": [{
                "w": "Have",
                "wb": 30,
                "we": 240
            }, {
                "w": "a",
                "wb": 240,
                "we": 270
            }, {
                "w": "good",
                "wb": 270,
                "we": 480
            }, {
                "w": "day.",
                "wb": 480,
                "we": 480
            }]
        }],
        "seg_id": 0
    }],
    "errorCode": "0",
    "action": "recognition"
}

识别结果 result 参数说明:

参数含义说明
bg分句开始时间单位毫秒/ms
ed分句结束时间单位毫秒/ms
w词(字)识别结果
wb词(字)开始时间单位毫秒/ms
we词(字)结束时间单位毫秒/ms
type识别结果类型0:最终结果,1:中间结果
seg_id分句 id从 0 开始递增

支持的语言表

支持中文和其他几种语言的互译。

语言代码
中文zh-CHS
英文en

语音支持

格式支持:wav(不压缩、pcm编码)

采样率:8k或者16k。推荐16k。

编码:16bit位深的单声道

格式代码
wavwav

状态码

errorCode意义
0成功
101缺少必填的参数
102不支持的语言类型
104不支持的接口版本
105不支持的签名类型
106不支持的返回格式
107不支持的传输加密类型
108appKey无效
110无相关服务的有效实例
111devId无效
112productId无效
201解密失败,可能为DES,BASE64,URLDecode的错误
202签名校验失败
203访问IP地址不在可访问IP列表
205请求的接口与应用的平台类型不一致
206时间戳无效导致签名校验失败
207重放请求
303服务端的其它异常
304 |   会话闲置太久超时    |

| 401 | 账户已经欠费停止 |
| 9001 | 不支持的语音格式 |
| 9002 | 不支持的语音采样率 |
| 9003 | 不支持的语音声道 |
| 9004 | 不支持的语音上传类型 |
| 9005 | 不支持的语音识别 Language类型 |
| 9301 | ASR识别失败 |
| 9303 | 服务器内部错误 |
| 9411 | 访问频率受限(超过最大调用次数) |
| 9412 | 超过最大处理语音长度 |

常见问题

  • 返回110

应用没有绑定服务实例,可以新建服务实例,绑定服务实例。

  • 返回108

appKey无效,注册账号, 登录后台创建应用和实例并完成绑定, 可获得应用ID和密钥等信息,其中应用ID就是appKey( 注意不是应用密钥)

  • 返回101

首先确保必填参数齐全,然后,确认参数书写是否正确。

  • 返回202

如果确认 appKeyappSecret 的正确性,仍返回202,一般是编码问题。请确保 q 为UTF-8编码.

版本记录

上线日期版本号更新内容
2018.11.07v1.0.0有道智云实时语音识别API上线

常见语言Demo

Java 示例


import javax.websocket.ContainerProvider;
import javax.websocket.Session;
import javax.websocket.WebSocketContainer;
import java.io.FileInputStream;
import java.io.InputStream;
import java.net.URI;
import java.nio.ByteBuffer;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.UUID;

public class AsrliteDemo {

    public Session session;

    public static void main(String[] args) throws Exception {

        String filePath = "D:\\en.wav";
        String langType = "en";
        String appKey = "您的应用ID";
        String appSecret = "您的应用密钥";

        asrlite(appKey,appSecret,filePath,langType);
    }

    protected void start(String uri) {
        WebSocketContainer container = ContainerProvider.getWebSocketContainer();

        try {
            session = container.connectToServer(Websocket.class, URI.create(uri));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void doAsrWebSocketClient(String filePath, String uri, Integer step) {
        AsrliteDemo asrliteWebSocketClientApp = new AsrliteDemo();
        asrliteWebSocketClientApp.start(uri);
        try {
            InputStream inputStream = new FileInputStream(filePath);
            int read = 0;
            byte[] bytes = new byte[step];
            while ((read = inputStream.read(bytes)) != -1) {
                asrliteWebSocketClientApp.session.getBasicRemote().sendBinary(ByteBuffer.wrap(bytes));
                Thread.sleep(100);
            }
            byte[] closebytes = "{\"end\": \"true\"}".getBytes();
            asrliteWebSocketClientApp.session.getBasicRemote().sendBinary(ByteBuffer.wrap(closebytes));


        } catch (Exception e) {
            e.printStackTrace();
        }
    }


    private static void asrlite(String appKey,String appSecret,String filePath,String langType) throws NoSuchAlgorithmException {
        AsrliteDemo asrWebSocketClient = new AsrliteDemo();
        String nonce = UUID.randomUUID().toString();
        String curtime = String.valueOf(System.currentTimeMillis()/1000);
        String signStr = appKey + nonce +curtime + appSecret;
        String sign =  encrypt(signStr,null);

        String uri = "wss://openapi.youdao.com/stream_asropenapi?appKey="+appKey+"&salt="+nonce+"&curtime="+curtime+"&sign="+sign+"&version=v1&channel=1&format=wav&signType=v4&rate=16000&langType="+langType;
        asrWebSocketClient.doAsrWebSocketClient(filePath,uri,1600);
    }

    /**
     * 获取MessageDigest的加密结果
     * @param strSrc
     * @param encName
     * @return
     * @throws NoSuchAlgorithmException
     */
    public static String encrypt(String strSrc, String encName) throws NoSuchAlgorithmException {
        byte[] bt = strSrc.getBytes();
        if (encName == null || "".equals(encName)) {
            encName = "SHA-256";
        }
        MessageDigest md = MessageDigest.getInstance(encName);
        md.update(bt);
        // to HexString
        return bytes2Hex(md.digest());
    }
    public static String bytes2Hex(byte[] bts) {
        String des = "";
        String tmp;
        for (int i = 0; i < bts.length; i++) {
            tmp = (Integer.toHexString(bts[i] & 0xFF));
            if (tmp.length() == 1) {
                des += "0";
            }
            des += tmp;
        }
        return des;
    }

}

import javax.websocket.*;
import java.io.IOException;

@ClientEndpoint
public class Websocket {

    private void print(Object obj) {
        System.out.println(obj.toString());
    }

    Session session;

    @OnOpen
    public void onOpen(Session session) {
        print("Connect to endpoint: " + session.getBasicRemote());
        this.session = session;
    }

    @OnMessage
    public void onMessage(String message) throws IOException {
        print(message);
        if (message.contains("\"errorCode\":\"304\"")) {
            onClose();
        }

    }

    @OnError
    public void onError(Throwable throwable) {
        throwable.printStackTrace();
    }

    @OnClose
    public void onClose() throws IOException {
        if(this.session.isOpen()){
            this.session.close();
        }
        print("session close");
        System.exit(0);
    }

}

Python 示例

# -*- coding: utf-8 -*-
import uuid
import time
import websocket
import hashlib

file_path = 'D:\\en.wav'
lang_type = 'en'
app_key = '您的应用ID'
app_secret = '您的应用密钥'


def initialize():
    nonce = str(uuid.uuid1())
    curtime = str(int(time.time()))
    signStr = app_key + nonce + curtime + app_secret
    print(signStr)
    sign = encrypt(signStr)

    uri = "wss://openapi.youdao.com/stream_asropenapi?appKey=" + app_key + "&salt=" + nonce + "&curtime=" + curtime + \
          "&sign=" + sign + "&version=v1&channel=1&format=wav&signType=v4&rate=16000&langType=" + lang_type
    print(uri)
    start(uri, 1600)


def encrypt(signStr):
    hash = hashlib.sha256()
    hash.update(signStr.encode('utf-8'))
    return hash.hexdigest()



def on_message(ws, message):
    print(message)


def on_error(ws, error):
    print(error)


def on_close(ws):
    print("### closed ###")


def on_open(ws):
    count = 0
    file_object = open(file_path, 'rb')
    while True:
        chunk_data = file_object.read(1600)
        ws.send(chunk_data, websocket.ABNF.OPCODE_BINARY)
        count = count + 1
        if not chunk_data:
            break
    print(count)
    ws.send('{\"end\": \"true\"}', websocket.ABNF.OPCODE_BINARY)



def start(uri, step):
    websocket.enableTrace(True)
    ws = websocket.WebSocketApp(uri,
                                on_message=on_message,
                                on_error=on_error,
                                on_close=on_close)
    ws.on_open = on_open
    ws.run_forever()


if __name__ == '__main__':
    initialize()

C # 示例

using System;
using System.IO;
using System.Net.WebSockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Security.Cryptography;

namespace asrliteclient
{
    public class AsrDemoWebSocketClient
    {
        public void start(String filePath, String uri, int step)
        {
            var client = new ClientWebSocket();
            client.ConnectAsync(new Uri(uri), CancellationToken.None).Wait();
            startReceiving(client);
            int bufferSize = step; //每次读取的字节数
            byte[] buffer = new byte[bufferSize];
            FileStream stream = null;
            try
            {
                stream = new FileStream(filePath, FileMode.Open);
                long fileLength = stream.Length; //文件流的长度
                int readCount = (int) Math.Ceiling((double) (fileLength / bufferSize)); //需要对文件读取的次数
                int tempCount = 0; //当前已经读取的次数
                do
                {
                    stream.Read(buffer, 0,
                        bufferSize); //分readCount次读取这个文件流,每次从上次读取的结束位置开始读取bufferSize个字节
                    //这里加入接收和处理数据的逻辑-
                    var array = new ArraySegment<byte>(buffer);
                    client.SendAsync(array, WebSocketMessageType.Binary, true, CancellationToken.None);
                    tempCount++;
                } while (tempCount < readCount);
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }
            finally
            {
                if (stream != null)
                    stream.Dispose();
            }

            try
            {
                byte[] closeBytes = Encoding.UTF8.GetBytes("{\"end\": \"true\"}");
                var closeArray = new ArraySegment<byte>(closeBytes);
                client.SendAsync(closeArray, WebSocketMessageType.Binary, true, CancellationToken.None);
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
                throw;
            }

            while (true)
            {
                Thread.Sleep(3000);
                break;
            }

            client.Dispose();
        }

        public async Task startReceiving(ClientWebSocket client)
        {
            while (true)
            {
                var array = new byte[4096];
                var result = await client.ReceiveAsync(new ArraySegment<byte>(array), CancellationToken.None);
                if (result.MessageType == WebSocketMessageType.Text)
                {
                    string msg = Encoding.UTF8.GetString(array, 0, result.Count);
                    Console.ForegroundColor = ConsoleColor.DarkBlue;
                    Console.WriteLine("--> {0}", msg);
                    Console.ForegroundColor = ConsoleColor.DarkGray;
                }
            }
        }

        public void doAscWebSocketClient(String filePath, String uri, int step)
        {
           start(filePath, uri, step);
        }

        public void init()
        {
            String filePath = "D:\\en.wav";
            String langType = "en";
            String appKey = "您的应用ID";
            String appSecret = "您的应用密钥";
            String nonce = Guid.NewGuid().ToString();
            TimeSpan ts = (DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc));
            long millis = (long) ts.TotalMilliseconds;
            String curtime = Convert.ToString(millis / 1000);
            String signStr = appKey + nonce + curtime + appSecret;
            String sign = encrypt(signStr, null);

            String uri = "wss:/openapi.youdao.com/stream_asropenapi?appKey=" + appKey + "&salt=" + nonce +
                         "&curtime=" + curtime + "&sign=" + sign +
                         "&version=v1&channel=1&format=wav&signType=v4&rate=16000&langType=" + langType;

            Console.WriteLine(uri);
            doAscWebSocketClient(filePath, uri, 1600);
        }

        //加密
        private String encrypt(String strSrc, String encName)
        {
            byte[] bt = Encoding.UTF8.GetBytes(strSrc);
            SHA256 sha256 = SHA256.Create();
            //暂时固定为SHA—256
            //暂时没发现内置的不同hash加密构造方法
            //可以自行编写代替
            if (encName == null || "".Equals(encName))
            {
                encName = "SHA-256";
            }

            byte[] HashData = sha256.ComputeHash(bt);
            StringBuilder oSb = new StringBuilder();

            for (int x = 0; x < HashData.Length; x++)
            {
                //hexadecimal string value
                oSb.Append(HashData[x].ToString("x2"));
            }

            return oSb.ToString();
        }

        public static void Main(string[] args)
        {

            AsrDemoWebSocketClient asrDemoWebSocketClient = new AsrDemoWebSocketClient();
            asrDemoWebSocketClient.init();
        }
    }
}

Php 示例

<?php
/**
 * Created by PhpStorm.
 * User: 谭上鸥
 * Date: 2018/11/2
 * Time: 11:58
 */

# 需要安装Workerman库或用其他websocket第三方库代替

use Workerman\Worker;
use Workerman\Connection\AsyncTcpConnection;
use Workerman\Protocols\Websocket;
require_once 'C:\Users\谭上鸥\PhpstormProjects\Workerman-master\Autoloader.php';

function start($filePath,$uri,$step){
                     $worker = new Worker();

                     $worker->onWorkerStart = function($worker) use($uri,$filePath,$step){

                     $con = new AsyncTcpConnection($uri);

                     $con->websocketType = Websocket::BINARY_TYPE_ARRAYBUFFER;

                     $con->onConnect = function($con) {
        };

                     $con->onMessage = function($con, $data) {
            echo $data."\n";
        };

                     $con->connect();

                     $handle = fopen($filePath, "rb");
        // 每次读取一定大小的文件
                     $readTime = ceil(filesize($filePath)/$step);
                     $read = 0;
        while($read<$readTime){
                     $content = stream_get_contents($handle,$step,0);
                     $con->send($content);
                     $read++;
        }
                     $con->send("{\"end\": \"true\"}");
    };

    Worker::runAll();
}

//uuid generator
function create_guid(){
                     $microTime = microtime();
    list($a_dec, $a_sec) = explode(" ", $microTime);
                     $dec_hex = dechex($a_dec* 1000000);
                     $sec_hex = dechex($a_sec);
    ensure_length($dec_hex, 5);
    ensure_length($sec_hex, 6);
                     $guid = "";
                     $guid .= $dec_hex;
                     $guid .= create_guid_section(3);
                     $guid .= '-';
                     $guid .= create_guid_section(4);
                     $guid .= '-';
                     $guid .= create_guid_section(4);
                     $guid .= '-';
                     $guid .= create_guid_section(4);
                     $guid .= '-';
                     $guid .= $sec_hex;
                     $guid .= create_guid_section(6);
    return $guid;
}
function ensure_length(&$string, $length){
                     $strlen = strlen($string);
    if($strlen < $length)
    {
                     $string = str_pad($string,$length,"0");
    }
    else if($strlen > $length)
    {
                     $string = substr($string, 0, $length);
    }
}
function create_guid_section($characters){
                     $return = "";
    for($i=0; $i<$characters; $i++)
    {
                     $return .= dechex(mt_rand(0,15));
    }
    return $return;
}

function init(){
                     $filePath = "D:\\en.wav";
                     $langType = "en";
                     $appKey = "您的应用ID";
                     $appSecret = "您的应用密钥";

                     $nonce = create_guid();
                     $curtime = strtotime("now");
                     $signStr = $appKey.$nonce.$curtime.$appSecret;
                     $sign = hash("sha256",$signStr);

                     $uri = "wss://openapi.youdao.com/stream_asropenapi?appKey=".$appKey."&salt=".$nonce."&curtime=".$curtime."&sign=".$sign."&version=v1&channel=1&format=wav&signType=v4&rate=16000&langType=".$langType;

    echo $uri;

    start($filePath,$uri,1600);
}

init();