Speech-To-Text Capabilities In Oxygen  Forensic® Detective

The popularity of voice messages continues to increase exponentially, as has the need to  integrate voice content into digital forensic investigations. 

The increased use of voice messages goes beyond convenience. Voice messaging allows for  faster, more convenient communication that can be listened to anytime. Voice messages are far easier to send than text messages that require typing and smoothly navigating across a keyboard. Voice messages also allow for emotional bonding, sharing not only the verbal  meaning of a message, but also non-verbal cues such as tone, pitch, volume, and speech rate  that better share the emotions behind the message. Voice messages are more practical for people with vision impairment or descendants of immigrants trying to speak with their older relatives in their native language.  

Voice Messaging Use Statistics 

Voice messaging platforms like WhatsApp, WeChat, Facebook Messenger, and Telegram  have fueled this trend. In all, the top four voice messaging platforms account for more than 6  billion monthly users worldwide. In 2023, WhatsApp alone reported almost 600 million voice messages are sent on the app everyday, leaving reason to believe that number has only  grown. As of April 2024, WhatsApp is estimated to have nearly 3 billion monthly active users, a 7% increase from 2023. WeChat claims over 1.34 billion users, and Facebook Messenger  amassed around 1.01 billion users worldwide. Telegram reports 950 million monthly active users.  

Voice messaging popularity in the U.S.

Voice messages using these platforms are especially popular in the United States. According  to a survey from April 2024, 2 in 3 Americans actively use voice messages when communicating, and 41% of them have observed an upswing in voice messages over recent  years. 

No generation gap in voice messaging 

There seems to be almost no generation gap in sending voice messages. The age ranges of  those sending voice messages can be broken down by 84% of Gen Z, 63% of millennials,  56% of Gen X, and 47% of baby boomers. 


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.


Solving Mounting Voice Message Investigation Challenges 

In digital forensics terms, it’s become crucial for investigators to leverage tools capable of  streamlining increased voice message data. Listening to voice messages is a slow, impractical  method for finding the evidence investigators need from a prominent source. The Oxygen  Forensics solution is converting spoken language from audio and video files into written text using our new speech-to-text engine. 

Watch: More new updates in Oxygen Forensic® Detective 17.0 release 

How to Use Speech-To-Text Capabilities in Oxygen Forensic® Detective

Starting with Oxygen Forensic® Detective version 17.0, users can analyze audio and video  files as well as voice messages much faster than any before, with software automatically  transcribing them to text. The speech-to-text feature is available at no additional cost.  

Users can also search through the recognized speech in the Search section and export  recognized data to reports. Oxygen Forensic® Detective also offers the following speech-to text capabilities: 

Speech to text in more than 50 languages 

Speech-to-text supports recognition and transcription of more than 50 languages that can be  recognized and transcribed: 

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese,  Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek,  Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh,  Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian,  Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili,  Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh. 

Set speech recognition at import

It is also possible to set speech recognition at import, automatically analyzing all audio files  within extraction while importing it.  

Multiple recognition model settings  

The investigator can also select a recognition model to their liking: tiny, base, small, medium  or large. The selected model determines the speed and accuracy of the recognition. Note that the larger the model, the slower but more correct the analysis. 

All transcribed speech can also be searched through via the Search section, easily detecting  the presence of keywords of interest in audio and video files. Want to learn more about this new feature? Request a demo.

Leave a Comment