It All Starts with an Accurate Automatic Transcription

Posted by Natalie Chilton on Jun 5, 2017 11:25:46 AM

Blog 9.jpg

Thanks to advances in technology such as deep learning, we find ourselves on the heels of an automated revolution: seemingly everything in our lives is becoming more automatic. According to Pew, 65 percent of Americans predict that robots and computers will ‘definitely’ or ‘probably’ do much of the work currently done by humans” by 2065. VoiceBase is right at the forefront of this revolution with our trailblazing speech-to-text API, allowing you to accurately and automatically transcribe conversations like never before. 

Automatic transcription is at the heart of actions such as automatic video captions, visual voicemail, and richer voice insights and predictive analytics that will help you provide a more seamless experience for the user. So, how does it work? Take a look at the features that make our speech recognition solution the best in the industry.



Machine Generated Transcript 

Once you upload a recording to the VoiceBase API, your audio will automatically be returned as a fully time-aligned, highly accurate transcript in a TXT, WORD, RTF, or SRT format. From there, you can navigate the text with our easy-to-use click-n-play plugin that allows you to search the term you're looking for and jump right into the spot where that word was spoken. Recorded content is now keyword searchable.


Per Word Confidence

Per Word Confidence is a measure of the acoustic similarity between the sound in the audio recording and the word that was transcribed. The Per Word Confidence score allows a user to locate certain key words and grade them for accuracy from the transcription. This offers the opportunity to study content more granularly, which can provide a deeper understanding of valuable insights.


Time-Stamped Words

This feature allows users to surface specific words or phrases based on their time in a recording. Each time stamp is combined with a source URL and is applied to all of their stored recordings.


Player SDK

Sometimes you get comfortable with tools you have already been using and are familiar with. VoiceBase’s Player SDK allows you to continue using your player of choice but with the added benefits of our UI components. Enhanced features include interactive transcript, automated keyword and topic extraction, user-defined keyword spotting, ad-hoc search, and transcript editor.


Stereo Speaker ID

It is now easier than ever to know who said what in a recording. With Stereo Speaker ID, you can automatically label the many speakers. There are several ways you can set up speaker identification, and other features such as keyword extraction and keyword spotting will include this feature.


SRT Output (Captioning)

This tool can be used to create video captions from our automatic speech recognition technology within minutes. VoiceBase supports industry-standards closed-captioning formats (SRT and DFXP) that can be used with commercial video players and video delivery systems.


Custom Vocabulary

You can improve automatic transcription accuracy by inputting custom words into the speech recognition API. Some common examples include pronouns, company names, product names, and acronyms to help improve accuracy and keyword spotting.



Accurate automatic transcription of your recordings is just the basis of the many functionalities that our speech to text solution has to offer. Want to find out what else VoiceBase can do for you and your business? Contact us today!




Topics: transcription, automatic transcription, accurate transcription, machine transcription, transcription API, speech to text

Written by Natalie Chilton

As a Marketing Manager at VoiceBase, Natalie specializes in field and content marketing, fostering real connections for customers to learn how AI-Powered Speech Analytics can benefit their business. She grew up in Santa Barbara, CA, and graduated from Cal Poly San Luis Obispo. In her spare time, you can find her swimming laps at the gym, or cooking a fun new meal.

What is Big Voice?

AI-powered speech analytics for the cloud

VoiceBase is defining the future of deep learning and communications by providing unparalleled access to spoken information for businesses to make better decisions. With flexible APIs developers and enterprises build scalable solutions with VoiceBase by embedding speech-to-text, speech analytics, and predictive analytics capabilities into any big voice application. 



The Modern Speech Analytics Playbook

Screenshot 2019-09-25 13.29.39

Subscribe to Email Updates

Recent Posts