External Nexmo

Dual Channel Transcription with Split Recording

As a part of our Voice API providing, Nexmo lets you report elements (or all) of a call and fetch the audio once the decision has completed. At present, we’re comfortable to announce a new enhancement to this functionality: cut up recording. Split recording makes widespread tasks comparable to name transcription even easier.

When cut up recording is enabled, the downloaded recording will include participant A (let’s name her Alice) in the left channel, and participant B (let’s name him Bob) in the fitting channel. This allows you to work with the audio from a single participant simply.

On this submit, we’re going to stroll by means of a easy use case. Alice calls the financial institution to seek out out details about her account, and Bob is the client help agent who answers the decision.

Document the Name in Stereo

When Alice calls the number offered by the financial institution, Nexmo solutions the decision, performs an introductory message and connects it to the financial institution’s real telephone number—recording all the audio within the name. To accomplish this, you’d use the next Nexmo Name Management Object (NCCO):

The necessary a part of this NCCO is the report action, which can document the audio and ship the URL to https://example.com/recording as soon as the decision is full:

To enable dual-channel recording, we need to update this motion to include “split” : “conversation” like so:

That’s all there’s to it! If you fetch the decision recording from Nexmo, you’ll have Alice’s audio within the left channel and Bob’s in the appropriate.

Call Transcription with IBM Watson

After you have the audio file, it’s time to transcribe the textual content. There aren’t many suppliers that settle for twin channel audio and transcribe them separately, so for this publish we’ll use ffmpeg to split the monitor into two mono tracks and transcribe them separately using IBM’s speech-to-text API.

To separate your audio file into two information, run the next command in a terminal (you might want to install ffmpeg first):

Now that we’ve two audio information we will ship them to Watson and get the text back as JSON in response. You need to use your language of selection to do that, however the quickest approach to get issues working is through the use of curl:

It will give us two JSON information that look just like the next:

Build the Dialog

As we requested timestamps, we will rebuild a timeline of the conversation because it happened. As soon as again, you need to use your favorite language for this (I’ll be using PHP). The steps we’ve to comply with are:

  1. Loop by way of JSON and merge all the entries into a single listing.
  2. Order the entries based mostly on the start timestamp.
  3. Output the dialog so as, with the timestamp, identify, and textual content proven.

The PHP code to do that seems like the following:

Once we run this code we see our dialog as it occurred:

Transcription Made Straightforward with Split Recording

Nexmo’s new cut up recording function permits you to report two individuals in their own audio channel, making transcription a breeze. To allow the function, all you must do is add “split” : “conversation” to your document motion.

To study extra about cut up recording, you’ll be able to read our product weblog publish on the release or take a look at the documentation.