Forensic Analysis for AI Speaker with Display: Amazon Echo Show 2nd Generation

Hello, thank you for attending our talk. My name is Min-A Youn. I’m a member of Digital Forensic Research Center in Korea University. I’m interested in digital forensics, IoT device forensics, and instance analysis. And today, I will give a presentation about this topic: Forensic Analysis for AI Speaker with Display. We focus on the 2nd generation of Amazon Echo Show.

This study was supported by IITP in Korea government. Let me introduce our research team. Our team consists of five people. A lot of them are the member of the DFRC in Korea University and consist of three Masters, one Research Professor and one Guardians Professor. We have studied this digital forensic research on IoT devices.

Today’s agenda consists of six parts. I will explain our research motivation and related work for this study. Also, I will talk about what we did with the forensic analysis of rare and new Echo device, and what we can learn from the correlation analysis. Based on this, I will introduce the framework and conclude the presentation.

My research motivation is as follows. As you know, the worldwide market share of IoT devices is increasing. According to a survey, there will be a $657 billion market by 2025, among that AI speaker are the high selling devices. Around the world, about 30 million units were sold in 2017, and are expected to sell more than five times in 2022.

There’s a slight change in the AI speaker market. If you have been talking to a speaker with only your voice, now you are able to have a richer conversation with display. Many company have recently released AI speaker with display. Amazon is releasing the Echo Show series and Google is also releasing the next series. The same is in the South Korea.

The Echo Show was selected because of the high sales among display speaker in the USA. Echo Show not only allows you to use voice commanders but also has a display, so you can make video calls, take photos and play YouTube videos. Furthermore, the Echo Show with Alexa consist of a huge ecosystem.

A previous study has divided this ecosystem into three company: hardware, cloud, client. So far we organised the research on Amazon Echo family. We divide the three approach: hardware, companion and cloud-based. The first is the hardware-based approach. Studies of data collection based on hardware approach are as follows.

Studies using JTAG and Debug Pad were conducted on the 1st generation of Echo Show – Echo. Also, we tried to apply the ADB method for Echo Show 1st generation, performed in 2019, and the method collected by Debug Pad for Echo 1st generation to Echo Show 2nd generation, but we have some limitation. So, we chose the last method, Chip-off.

The second is a companion, client-based approach. In 2017, Chung’s research on the Alexa artefacts[1]  on Android and iOS was conducted and in 2019 the detailed research on To-do List or voice commanders was conducted by Engelhardt. And finally, in the cloud-based approach, Roussev present the collection methodology using API. With the idea, Chung developed an API automation tool for Alexa.

We collect and analysed each of the three elements that make up the ecosystem of Amazon Echo Show. We used this for about six months and collect data with Chip-off. And we collect the mobile output using the rooting Galaxy S6 Edge that installed the Alexa application and analysed the cache file to find the Alexa Cloud API. By using URL we collect this cloud data. The collected artefact were all analysed in Windows 10 environment and open too.

We tried many ways to collect data from the device. Based on existing research we tried ADB method and Debug Pad. However, there were some limitations. So we choose to disassemble the product and chip-off to obtain data. It has 153 Ball sites of flash memory and when we connect to the Easy JTAG programme we can see that the partition was configured as shown in the picture on the right.

Among the different partitions we could see the artefact left by the Echo Show on the Android data partition. It is consist of EXT4 file system. Upon rooting the Android smartphone we can collect more artefacts. For the cloud data, it was collected using an API given the normal account information. This API address can be determined by proxy or by using the remaining cache files on the browser. We identify it by using the latter method and confirmed to have been changes from the previous study. Also, we found the address where we can check the photos taken by using the device and its camera.

We divide the collected artefacts into categories. In these differences it is important to identify a trace of user’s behaviour. So, we categorise artefacts into three categories: account, system and activity.

The account contains information such as the user’s name or a nickname that user set. The system is the logs left by the system, such as wake up bugdetection and camera trace. It can allow—it can guess the behaviour of the user. Finally, activity includes the artefacts that user leaves while using the Echo Show. Such as watching videos, taking picture or videos and using the internet or taking pictures there, or videos.

The Echo Show’s data partition is about 8 gigabytes and it allows searching file or space on Android 5. So we did the same analysis as the smartphone analysis method. Like smartphone, we could see the nickname and fulfil photos using the setting in account database and XML. In the system log, there were trace of wakeup word detection and camera uses. The trace was recurring for about a month.

There were three major interesting traces of direct user behaviour. The Echo Show is the first browser that allows you to browse the internet like a tablet PC. The browser [indecipherable] browser or Chrome browser we can analyse to find out email contents and internet browsing history. In addition, music video of various sounds can be viewed through voice commanders. This remains as a cache file and it can be identified by the real sampling process.

Finally is a camera trace on the Echo Show. Unfortunately, there were no photos left in the Echo Show. I think once the photos were saved in the cloud successfully, it seems to me is to be deleted. However, the photo database was able to determine the date for the photos and the identifiable photo ID in the cloud.

Smartphone artefact are the same as previous studies. Your smartphone will know the email address and name you used to log in. You can also see a list of cache files and skills installed in XML. Just like previous studies, there was a history of conversation with the device and the database. The Echo Show with a display also shows a picture file related to the voice commander to the user, and this trace remains as a cache file.

The cloud data was similar to previous research in 2017. The API address of the photo function when the display was mounted in the AI speaker is as follows. In addition, the new or different address or additional artefact found in the cloud were organised in the paper as the appendix.

The same type of artefacts may be stored equally in several devices, but the same type of artefact may be distributed and stored like a piece of puzzles. So, we gather some information that we got from different sources, just like putting together puzzles, and we found some new, some information. We present them UC, UI, UB: User Credential, User Behaviour, User Interests.

With an Echo Show, users can surf the internet browser or download additional skills. Naturally, Amazon account information is likely to remain in the Echo Show when the user log in through the browser. With a simple proof of concept script, we were able to get an ID and password for an Amazon account that could access the Alexa Cloud.

Unfortunately, there are no trace of photos left on the Echo Show, but we can collect the photos we took on Alexa Cloud. We can create a URL that can access photos by using the note ID in the photo-related database. Also, using note ID and API URL, we can see the trace of deleting photos. In order to use AI speaker, you just use companion devices. In smartphones, devices are classified using cardID, and this cardID can be found in the configuration file of Echo Show.

By combining this information, we can identify the specific device connected to the smartphone. Echo Show reply to user voice commander with pictures. The cache file for shopping items remains on the Echo Show, and the cache file for conversation remains on the smartphone. By combining the cache file with the conversation history remaining in the cloud, we can guess what commander that user gave to Echo and what information they received, as shown in the picture below.

This is a visualisation of the information we can get from multiple device. In order to investigate the Echo Show data related to device itself, companion device and cloud can be saved. So we propose an integrated framework for collecting and analysing data at the hardware, software and cloud levels.

This is a general framework that can be applied when conducting research in other IoT device-centric environments. The three different sources can be used as complementary sources in digital investigations. In order to investigate the Echo Show data related to the device itself, companion device and cloud can be saved.

So, we propose an integrated framework for collecting and analysing data at the hardware, software and cloud levels. This is a general framework that can be applied when conducting research in other IoT device-centric environments. The three different sources can be used as complementary sources in digital investigations. Especially if the device is investigated, it may be possible to gain access to cloud with the credential information in the speaker.

Previous study have accessed the cloud by obtaining user information through a smartphone, PC or other route. We explored the possibility that information obtained through a device analysis could be a new key to cloud access.

To conclude this presentation, IoT devices and users are generating a lot of data. I think this data can be used importantly in digital evidence. Especially in the case of the AI speaker with display, we believe that there will be more evidence because of mutual communication using the display.

The study conducted a digital analysis of the new device Echo Show. We associate evidence from three sources to provide clues to guess user behaviour, and also we believe that the methodology used here will be similarly applied to other smart speakers. And our future work is to perform forensic analysis on other display, like Google Nest Hub or other display-type display devices. Thank you for listening to my presentation.

Leave a Comment