The popularity of voice messages continues to increase exponentially, as has the need to integrate voice content into digital forensic investigations.
The increased use of voice messages goes beyond convenience. Voice messaging allows for faster, more convenient communication that can be listened to anytime. Voice messages are far easier to send than text messages that require typing and smoothly navigating across a keyboard. Voice messages also allow for emotional bonding, sharing not only the verbal meaning of a message, but also non-verbal cues such as tone, pitch, volume, and speech rate that better share the emotions behind the message. Voice messages are more practical for people with vision impairment or descendants of immigrants trying to speak with their older relatives in their native language.
Voice Messaging Use Statistics
Voice messaging platforms like WhatsApp, WeChat, Facebook Messenger, and Telegram have fueled this trend. In all, the top four voice messaging platforms account for more than 6 billion monthly users worldwide. In 2023, WhatsApp alone reported almost 600 million voice messages are sent on the app everyday, leaving reason to believe that number has only grown. As of April 2024, WhatsApp is estimated to have nearly 3 billion monthly active users, a 7% increase from 2023. WeChat claims over 1.34 billion users, and Facebook Messenger amassed around 1.01 billion users worldwide. Telegram reports 950 million monthly active users.
Voice messaging popularity in the U.S.
Voice messages using these platforms are especially popular in the United States. According to a survey from April 2024, 2 in 3 Americans actively use voice messages when communicating, and 41% of them have observed an upswing in voice messages over recent years.
No generation gap in voice messaging
There seems to be almost no generation gap in sending voice messages. The age ranges of those sending voice messages can be broken down by 84% of Gen Z, 63% of millennials, 56% of Gen X, and 47% of baby boomers.
Solving Mounting Voice Message Investigation Challenges
In digital forensics terms, it’s become crucial for investigators to leverage tools capable of streamlining increased voice message data. Listening to voice messages is a slow, impractical method for finding the evidence investigators need from a prominent source. The Oxygen Forensics solution is converting spoken language from audio and video files into written text using our new speech-to-text engine.
Watch: More new updates in Oxygen Forensic® Detective 17.0 release
How to Use Speech-To-Text Capabilities in Oxygen Forensic® Detective
Starting with Oxygen Forensic® Detective version 17.0, users can analyze audio and video files as well as voice messages much faster than any before, with software automatically transcribing them to text. The speech-to-text feature is available at no additional cost.
Users can also search through the recognized speech in the Search section and export recognized data to reports. Oxygen Forensic® Detective also offers the following speech-to text capabilities:
Speech to text in more than 50 languages
Speech-to-text supports recognition and transcription of more than 50 languages that can be recognized and transcribed:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
Set speech recognition at import
It is also possible to set speech recognition at import, automatically analyzing all audio files within extraction while importing it.
Multiple recognition model settings
The investigator can also select a recognition model to their liking: tiny, base, small, medium or large. The selected model determines the speed and accuracy of the recognition. Note that the larger the model, the slower but more correct the analysis.
All transcribed speech can also be searched through via the Search section, easily detecting the presence of keywords of interest in audio and video files. Want to learn more about this new feature? Request a demo.