The Speech to text module explained (STT)

Christina Dechent
Christina Dechent
  • Updated

In this article, we want to have a closer look at the Speech to text module (STT) and its different setup possibilities. 

What is the Speech to text module?

The Speech to text module stores voice or typed input given by the customer in the IVR or voicebot. There are many use cases:

  • leaving a call reason
  • agreeing to a call being recorded
  • typing in a customer ID
  • confirming a previous input, etc.

The answer of the customer is stored on a variable which you can use to route the call or to send it to external systems like your helpdesk. It's very similar to the Input Reader in its basic functionality but much more powerful.

Setting up a STT module

The STT module has many different settings which you can tweak to optimize the accuracy of recognition as well as the customer experience. We will go through each of these settings in the following.

We will look at the Audio tab first. If you know all this already, you can directly jump to the "Settings" part.

The "Audio" tab:

  1. In the audio tab, you have the option to select either audio files or a TTS input that will be read to the customer. Therefore, just toggle the switch between Audio and Text to Speech and edit its details

     

    If you need more information about TTS, have a look at this article

  2. Below the audio options, you can select the input retries - this is where you set up how many times the system should try to ask the caller for the input again or what to do when no input was captured at all. The following options are available:

     

    If you select any "retry" option, you will see a window where you can add your audio/TTS:In this example, you can enter an audio file or TTS here to have it read to the customer in case they are silent (give no input). In this case, the call will not move on to the next module in the call flow if the customer doesn't say anything. Instead, this audio will be played to the customer and they get a second chance to enter an input. 

For all the audios you have the option to configure the timing: You can determine how many milliseconds of pause there should be before and after the audio is played.

Great, now that we know about our audio options, let's look at the STT settings:

The "settings" tab:

In settings, you have five sections. Each section allows you to edit different settings to adjust and optimize your speech settings. We have these options:

  1. Variable name
  2. Input behaviour

 

1. Variable name - Here you define the variable name. This variable stores the customer's input. You can later use it in Triggers, you can then include it in Switch nodes, or queue selections to route the call depending on the customer input. Have a look at this article if you want to read a use case on the topic. Use a meaningful variable name to make it easier to find in the expression list afterwards and remember its function.

2. Input behaviour

If you are a beginner, only one setting is relevant:

Language: babelforce offers 70+ languages out of the box. Select the language you expect the input in. Please note that we cannot translate voice input out of the box. So, if you choose "English, United States" as the language for instance, our platform expects English input from the caller.

There are some advanced or beta options which you can play around with but we suggest you leave the default if you are not very familiar with speech modelling (see them under "See more options" tab):

Template: babelforce optimized certain use cases in specific languages. However, only some templates have been optimized, so far. Please follow up on this article to see which Templates are already available for which language.

Speech module:  We suggest speech modules only to advanced users. Some modules might be optimized for a specific purpose in your language. By default we suggest "Command and search".

If you want to dive deeper into the topic of speech recognition by Google, check out this article by Google.

DTMF input: Toggle this switch if you want to allow the customer to type in a number on their phone to give the input

Input termination: Here you can configure if the customer should press a certain key once they finished their input. You can also choose "None" to not use any key

Accept only numeric input = false: Do not touch this setting if you expect string input. 

Pattern (regular expression): Include a Regex code here which checks the customer input. You could for example use ([A-z]{3,25}) to only allow letters (no numbers or symbols) that are between 3 and 25 characters. This is only available if numeric input = false

Accept only numeric input = true: Toggle this switch if you are expecting a number input, like a customer ID consisting solely of number, or birth year. 

min/max length: These fields only appear if numeric input = true. You define the expected minimum and maximum length of the input. If a customer ID is expected to be between 8 and 10 digits long, min length = 8 and max = 10. Once 10 is reached, the input will be interrupted and continues to the next module. 

Timeout between inputs: This number indicates, how many seconds of pause are allowed between the single customer input parts, before the customer input is considered complete and the call flow moves on to the next model. E.g., imagine the customer reads out a 5-digit customer ID. If the continuous input timeout is 2, the customer has two seconds after every digit they read out, before the input is considered complete. 

Allow to barge in = false: Audios or TTS cannot be interrupted. Customer inputs can only be made after the input.

Allow to barge in = true: Audios or TTS can be interrupted by the customer.

Barge in delay: Define, after how many seconds the customer can start interrupting the audio.

  1.  

Was this article helpful?

/

Comments

0 comments

Please sign in to leave a comment.