1/7/2023 0 Comments Motrix speech to text![]() Responses will be sent back to Asterisk to let us know if our transaction succeeded or failed. The media will also be sent over the Websocket in the form of Websocket binary frames. Asterisk will pass along all the necessary information in a JSON protocol via the Websocket, which will tell the application what to do with the media that follows. The external application will be responsible for taking the media Asterisk provides and sending it to the appropriate speech service. This will include the codec attributes as well. Only one codec will be negotiated, which will be included in the response Asterisk gets back from the external application. Asterisk and the external application will communicate in JSON, since it's simple and easy for everyone to use. A new connection will need to be started if you wish to change which type of speech service you're doing. The type will be either speech to text or text to speech it can't be both, and it can't change from one to the other within the same session. A session represents a single speech to text or text to speech. A benefit of using a single connection per session is that it provides an easy mechanism to scale out the external application if needed. There will only ever be ONE Websocket connection per session. The connection to the external entity will be done using a Websocket from Asterisk to the external application. From a user perspective this will be exposed via dialplan applications and functions which then call into the speech engine. From an Asterisk C developer perspective this will be done using the core APIs for speech to text and text to speech. There will be three entities involved: Asterisk, an external application, and the speech service.Īsterisk will be responsible for providing media to the external application and accepting media from the external application, as well as communicating desired functionality. The module will allow multiple external applications to be configured and the user, in the dialplan, can select which one will be used. A module will be created which registers to these APIs and provides the functionality described in The Process, Speech to Text, and Text to Speech. There are already some existing dialplan and API functions for speech to text, but the core API (and dialplan applications) for text to speech will need to be created. This page will break down the process into sections of how Asterisk is going to accomplish this. Speech functionality is going to be passed to an outside entity to handle all of the heavy lifting, allowing us to leverage official SDKs provided by more friendly languages. This project aims to provide an easier bridge between the two worlds to provide a better user experience while allowing developers to more easily connect to modern speech APIs. Speech to text, text to speech, and other speech related things in Asterisk have traditionally been done in two ways: C modules, or external AGIs that use record and playback.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |