Tips And Tricks: Data Collection For Cloud Workplace Applications

Gail: Hi everyone. Thanks for joining today’s webinar, “Tips and Tricks: Data Collection for Workplace Cloud Applications”. I’m Gail Green and I’m the marketing specialist here with Cellebrite Enterprise Solutions. Now I’d like to introduce our speaker today, Monica Harris.

Monica is an experienced E-discovery professional and over the past 15 years has specialized in the development, implementation and training of proprietary software for companies such as KLDiscovery and Consilio. Before joining Cellebrite, she worked with the US Food and Drug Administration, where she oversaw policy and procedure creation, Enterprise Solution rollout, and training for Enterprise for agency litigation and freedom of information requests.

Monica is an active leader and mentor in the E-discovery community and currently serves as president of the Association of Certified E-discovery Specialist (ACEDS, DC Chapter). A board member of the Master’s conference and community chair of the DC Master’s Conference.

Monica has served as assistant director for Women in E-discovery (WIE, DC Chapter), and collaborates frequently with DCWIE, the DC Bar and the Women’s Bar Association of DC. Thanks for joining us today, Monica, and if you’re ready, I’ll hand it over to you to get started.

Monica: Thank you, Gail. And thank you to everyone that’s joined us for today’s webinar, “Tips and Tricks: Data Collection for Workplace Cloud Applications”. So, let’s get started.

For today’s webinar, we have some key takeaways: the importance and challenges of collecting from Cloud Workplace apps, best practices for cloud workplace app collection, and a preview of cloud workplace app collection and Cellebrite’s Endpoint Inspector.

So let’s dive right in, starting with how Cellebrite Enterprise Solutions defines workplace apps. Depending on who you’re talking to, the definition could change, but for Cellebrite Enterprise Solutions specifically, cloud workplace apps are licensed by your organization and they’re administered by IT.

So we’re talking about Office 365, including Outlook, Teams, SharePoint, OneDrive, Box, Slack, Google Workspace, ShareFile, and Ignite. For some of the applications that you’re looking at on the screen right now. Office 365, for example, Office 365, I have a personal account. So at home I can create spreadsheets for budgets, things of that nature, maybe events that are coming up, but that’s not the Office 365 that we’re talking about when we’re talking about cloud workplace apps.

We’re talking about the Office 365 that as a new employee, you may start your first day and receive access to from IT. We’re not talking about the Slack that you may use to stay in contact with your family (that could be across the country or maybe even across the world), we’re talking about the slack that you could use to communicate with various teams throughout your organization so that you can streamline the activities that you do every day.

This is not individual cloud workplace apps, not the ones that we own in our day-to-day lives, but the ones that are owned by the organizations that we are members of. Why is collecting from cloud workplace apps important? Collecting from cloud workplace apps is important because they are the largest source of employee data.

So, whether it is for discovery or whether or not it is for investigations, where you’ll find the most data is in these cloud workplace apps. For Cellebrite, we are known for collecting mobile data with the acquisition of Black Bag Technologies, we added to that the ability to collect from computers, particularly from Macs.

Macs Silicon T2 encryption, but computers overall, so mobile and computers are more of the modern (shall we say) modern data collection. But the data overall, the bulk of the data is found in these workplace apps, whether that’s because of the change that we have seen in our day-to-day that has happened in the past two years in terms of how we’ve communicated, or whether or not that is by the fact that a majority of organizations have one (or more than one) of the applications that we’re looking at on the screen behind their firewall.

Why would you collect from someplace other than the original source of data, right? If we go back very quickly and take a look at these apps: Slack, Office 365, Google Workspace, they all allow you the ability to collect directly from the source. But why would you collect, let’s say, with Cellebrite’s Endpoint Inspector?

Well, there’s a few challenges that could present themselves when you are collecting directly from the source, when you’re collecting directly from Office 365 or when you’re collecting directly from Slack. And the first one groups those challenges together because of licensing.

When we look at the licensing for these cloud workplace apps, there are various features that come with each license. The license can vary in terms of the size of the organization. Do you have 1,000 employees? Do you have 5,000 employees? Are you larger? In addition to that, not only do we see a difference in the size in terms of how many individuals can be included on a license, but we see the difference with the features as well.

So, depending on whether you have an E3 or an E5 license, you may have different options available to you in terms of features. Whether or not you have Enterprise Grid, Enterprise Plus, you may see different features there. Whereas when you are working with a tool that’s collecting directly out of these applications, that tends to streamline the features, right? Now you’re working with the tool as opposed to the source.

Additionally as I stated in the earlier slide, most organizations have more than one Cloud Workplace app. So, it’s not that you have Office 365 alone. You may have Office 365. You may have Office 365 and Box. You may have Office 365, Box and Teams. So, the question is how much time do you have?

Everything can be done with time and resources, but the more time and resources that we have available to us, the more value that we can add to our organizations and to our customers. So, why spend that time and those resources moving from one cloud workplace application, to another cloud workplace application, to another cloud workplace application as you work to collect data from all of the sources across the cloud workplace apps that may be available? But working with tools that can work with the original source of data can give you the ability to be uniform and collect across them.

And once we start talking about collecting across cloud workplace apps, then we have to start talking about entity normalization. The fact that in your Outlook, your Teams, your SharePoint and your Slack and your Box, you may not be the same person across. I can be Monica Harris in Outlook, I could be Monica S Harris in Box, and I could be Monica H in Slack, right?

So that entity normalization, understanding that in order to see the full picture, in order to get to the smoking gun that you will need to collect across cloud workplace apps. And when you begin to collect across cloud workplace apps, you can begin to see different identities, but you’re still looking at the same person. So, entity normalization is important as well. These are things to consider when you are deciding within your organizations where you are going to collect the data from and how you’re going to collect it.

So historically, when collecting from cloud workplace apps, we focus on email: Office 365 or Google Workspace, for example, if that’s where you’re collecting from. But in the past year (perhaps even in the past two years), as the way that we have communicated within our organizations has shifted from more of a formal communication standpoint with what you see in email, to continuous communication, which is what we are seeing in Teams and Slack.

But Slack for instance, has been trending and is really starting to become just as relevant in your collections as your Outlook, as your Google Workspace. And thus, this is why we’re starting to see why when you’re collecting from cloud workplace apps, you’re collecting from more than one or you’re collecting from more than one organization that provides cloud workplace app capability, starting with Slack.

At the end of 2019, Away CEO Steph Korey stepped down just four days after investigation from a publication called The Verge. The publication highlighted the company’s toxic culture. Korey, one of the luggage brands co-founders, was replaced with former Lululemon executive Stewart Hazleton. Though Korey still continued as an executive chairman, the news came after days of public backlash due to leaked information showing Korey routinely intimidating employees on public Slack channels. Away did not allow their employees to email each other, which is quite unusual, actually.

Instead, they asked that direct…and in addition to that, they asked that direct messages be kept to a minimum. The result was that almost all conversations took place on public Slack channels where executives gave harsh feedback and reprimanded people for small mistakes during Slack, or on Slack, right? So, in this particular case, we’re not necessarily talking about any litigation that may have taken place, because the former CEO Steph Korey stepped down before that could happen.

But we are talking about nefarious activity that was taking place in a cloud workplace app that was outside what we would traditionally be using for E-discovery. Just goes to show the shift, or the trend, of why most organizations, when they’re performing investigations are when they’re looking for discovery, are looking at more than one cloud workplace app.

Now, in that example, we were not talking about litigation, but there are examples of instances where litigation occurred, black data was not rendered, and as a result of that, sanctions were placed. And I think the most recent one would be Red Wolf Energy Trading.

There’s a lot that goes into Red Wolf Energy Trading, and I would encourage you to go take a look at that on the internet, it’s very interesting. But at a high level, this case, which started in 2019 and wrapped up a few months ago in September, the United States District District Court for the District of Massachusetts entered a default judgment against the defendant’s due to repeated failures to produce relevant documents from the defendant’s Slack account.

Moreover, the court found several other issues with the defendant’s production, including the defendant’s failure to offer entire threads of communication (which, of course, you’re going to need for contacts), the defendant’s counsel failure to utilize a vendor who could provide an Excel spreadsheet of messages derived from Slack terms and the defendant’s omission of several documents that were significant to the merits of the underlying claims, one of which was described as a proverbial smoking gun.

All of that data was in Slack. So at that 50,000 foot view, in this particular case there was an ask for a Slack archive, the defendant went to a service provider who gave them an estimate for $10,000. The defendant did not want to go with that particular service provider, so they found a developer in Kazakhstan, and as a result of the data not being handled in a proper way (and some of that which we just went over), some things weren’t produced at all.

Some things were produced without context, and then a handful of other caveats, this particular defendant, Red Wolf Energy Trading was fined nearly $10 million. $10,000, could have saved them nearly 10 million if they had produced data from Slack. So, when we are looking to do collections, when we’re trying to understand the truth and get to the relevant facts, we’re not just looking at email anymore. Most investigations, most discovery involve collections from more than one cloud workplace app.

So, now that we have discussed what cloud workplace apps are, why it’s important that we collect from them and why it’s important that we collect from more than one, let’s talk about some best practices for collecting. So, 1) collect from all sources of cloud workplace app data. I don’t think there’s anything more relevant than the case that we just talked about, the Red Wolf Energy Trading.

It could be because that just happened. We’re still within the 90 day period when we’re talking about that case. But I think overall it’s the trend of understanding how we’re communicating and how that’s changed in the past two years. More…many of us are communicating in more than one cloud workplace apps. So, when you’re collecting from these cloud workplace apps, understand that you’re collecting from more than one source.

Even if you’re looking within something like a…that’s a suite like Office 365, more than likely you’re looking at Outlook, you’re looking at Teams and you’re looking at OneDrive, SharePoint, you know, being sure or being able to make sure that you can collect from all of those, and then additionally, looking around. What other communication channels are available to these teams? Are they using Slack in addition to OneDrive? Are they also using Box? Things of that natures. Just making sure that you have information governance and that all workplace apps are accounted for during your collection.

But as soon as you begin to talk about collecting from all of the apps, understanding who an individual could be across all of those apps is important as well. And we talked a little bit about that earlier. The fact that one person, one custodian, one person of interest could be their first and last name in an email system, their first and last initial in something that’s more of continuous or a short messaging system, so forth and et cetera. So, making sure that you understand not only where all the sources of data are across the cloud workplace apps, but you understand who an individual can be across all of the cloud workplace apps.

But also once we start talking about a variety of cloud workplace apps and these cloud workplace apps are the largest sources of employee data, making sure that we’re collecting from them in a smart manner, okay? There are often deadlines when it comes to discovery. And then in addition to that, we just want to make sure that we’re making the most of our resources when we’re pulling data out of the cloud.

So, performing some type of EDA, some type of early data assessment by, if possible, not pulling all of the data out of these cloud workplace apps, but pulling out a portion of the data to be able to do early data assessment. If you understand that for the past two months I was talking about Cellebrite and you see 2 terabytes of data, maybe we’re pulling out just the metadata, pull out a load file, take a look at what is in that, and then from there determining if perhaps there’s a way to create additional search terms or create just additional criteria of filtering over all that [inaudible] that you’re getting to more relevant data faster when you’re collecting from these larger sources of employee data.

And then of course, we’ve already talked about filtering on those metadata export. That is the idea there. Rather than go in and put in your search terms, like we may have done in the past, the data that could be in these sources is vast. It’s very vast. So, once you go in and you understand you have that 50,000 foot view of the data that you could potentially export, then going in and making sure that your filters are looking just at the data that’s relevant to you.

But speaking of the data that’s relevant to you, because there is so much data in these cloud workplace app sources, also understanding what could be sensitive within that data, what could be personal identification, what could be personal health information, what could be personal credit card information, for example, so that you can identify that and put that aside.

Also, depending on what you are searching for, particularly if you’re looking at your Teams or your Slack, think about excluding…being able to identify so that you can exclude at a later time, any zero byte files. Sometimes when we’re sending pictures, sometimes when we’re sending images, you know, that could lead to an additional message that may not have content. And so we want to accelerate the time that it takes to review these documents.

So, when you’re doing your early data assessment, when you’re pulling out the metadata for the documents that you could potentially be exporting out of these cloud workplace sources, or for the conversations that you could potentially be exporting out of these cloud workplace apps, make sure that you’re taking a look at the size of the files too, right? I know that when you’re doing analytics, for example, I’m always looking for zero byte files so that I can put those aside. They could be generated for any reason. But that’s something you wanna take a look at when you’re looking at the cloud workplace apps as well.

Last but not least, and very possibly the most important, when you’re collecting from cloud workplace apps, having repeatable and defensible workflows that you could use for collection is important. Cloud workplace apps are the largest source of employee data. And when you’re collecting, if you want the full scope or the full picture, more than likely you’re collecting from more than one.

In addition to that, because you’re looking at so much data, you could be going in, putting in your terms, pulling out just the metadata only to give yourself some type of QC massaging in that and going back. So, after you’ve done all of that work and now you have to go in and do it for multiple custodians and perhaps massage as you go, you wanna make sure that you are storing how you’re setting up these filters, how you’re setting up which applications you’re collecting from, how you’re setting up who you’re collecting from in some type of template, or some type of documentation, something that you can refer to each time, because these are not small collections.

This is not a limited amount of criteria. They’re quite vast. So, you want to make sure that you have a repeatable process and even something that could be automated if possible, so that you could reduce that manual intervention and also reduce the human error that could occur as a result of the manual intervention.
So, we’ve talked about some best practices for cloud workplace apps.

So, now let’s talk about what cloud workplace app collection looks like in Cellebrite’s Endpoint Inspector. At a glance, Cellebrite’s Endpoint Inspector has the ability to collect from some of the top cloud workplace apps and you can see which ones specifically here in this chart. We talked about some of them earlier: Office, in terms of your Outlook, in terms of SharePoint, OneDrive, Google Workspace and others.

But here you see the complete list. The custodian set up through active directory is key. Again, we’re not talking about the Teams, or the Box that you have at home on your computer, we’re talking about your organizations. So as you have departing employees, as you have new employees coming in, you want to make sure that you are actively connected to who may be the owners of all of that information that you could be potentially collecting. And you also wanna make sure that you understand who those individuals are across applications.

So, in the event that you find out that there are different identities, you have the ability to map that together so you can build that picture. You want to make sure that you have the ability to filter, and we’ve built that in to Endpoint Inspector as well. We have the ability to filter by date, range, keyword, and file size. We can also identify that sensitive information since that could be tagged and put aside. And we’ve given you various output formats. It’s not that from these largest sources of employee data you want to go in and pull out natives and texts: you may just wanna take a look at the metadata.

In addition, you may not want to pull out something that’s ready to load to a document review tool. You may want to pull out the natives themselves so that you can process that in the tool of your choice. And then we have also added collection templates. These collections, the criteria that goes into them, the EDA behind that, all of that could be a lot of work, and all of that could have nuances to it. And we want to make sure that all of that good work is harnessed in a template so that it can be defensible, repeatable, and sustainable.

So, now that we’ve talked about features at a glance, let’s take a look at the tool itself. Endpoint Inspector is Cellebrite Enterprise Solutions’ leading solution for the remote collection of mobile and computer in addition to cloud sources that are personal (not cloud workplace apps like WhatsApp). And then lastly, we are introducing cloud workplace app collection.

So, what you see on the screen is our web interface that allows the secure authentication and management of users. On the right hand side of the screen, you can see we have the ability to do that remote computer collection, the remote mobile collection. And by the way, when we say remote collection here, we’re not talking about the type of remote collection that would mean you sent out hardware, you shipped a laptop that had software on it that the individual would then have to engage with, the custodian would have to engage with for collection.

We’re not talking about sending an individual out with hardware who can then collect from the custodian. We’re talking about, for mobile device collection specifically, an email that can be sent to a custodian with three to five steps on it, a less than 1 MG install on the custodian’s laptop, and then the custodian can use the power cord for their phone (their iOS or their Android), plug it into the laptop, spin up the application, which will run them through a wizard with less than 10 steps and send collected data from their phone back to the examiner: true remote collection. Same for computer.

Whereas we have an agent running on the computer that has the ability to send back data to the examiner, for cloud, we are utilizing QR codes and for workplace apps, let’s take a look.

For workplace apps, you can filter by date, date range, keywords and file size. You can see here that is just as simple as creating a name for your collection, or using a previously saved template. You can choose your custodians and in this particular screenshot we’ve done six, but this install is connected to active directory, so you can collect from the relevant custodians.

We’ve chosen four of the cloud workplace applications that we have available to choose from. And in this case we’re pulling those documents out as natives. We’ve already done our metadata only pool so that we could take a look at what we’re going to get. We have verified that our search terms are bringing us back the data that we’re expecting. We’re seeing the right date ranges, we’re seeing the right conversations with the subject lines and also with the texts and the chats. So we’re gonna pull out the natives so that we can process them at a later time.

We’re also going to make sure that we’re looking at minimum, because again, we’ve looked at the metadata, so we’ve been able to make some decisions there and we don’t want anything larger than 10 GB, we’ve preserve that for a later time. We believe we have the most important information. This is what collecting and cloud workplace apps and Cellebrite’s Endpoint Inspector can provide.

And although that we’re talking about cloud workplace app collection and Endpoint Inspector, it’s just the thought (as I had said before), that Cellebrite Endpoint Inspector allows you to collect from endpoint data anywhere it can be found.

So, we have cloud workplace app collection that’s unified with computer collection, with mobile collection, with personal cloud collection, or chat collaboration cloud collection like WhatsApp, all in one tool, which can be set up all in one job. You can set up your mobile, your computer, your cloud workplace apps for multiple custodians, or you can do any variation of. Here you see collecting from all sources, here you see collecting from mobile and workplace, and here you see collecting from mobile and cloud. All of this can be found in one solution.

So, what are our key takeaways from this webinar? We start with key takeaways at a high level, but let’s deep dive a little bit now that we’ve had a conversation about what cloud workplace apps are, why they’re important and how you can collect with Endpoint Inspector, what do we want you to take away from this webinar? 1) Collecting from multiple cloud workplace apps is important for organizations. Not just one, not just two, but multiple cloud workplace apps. 2) Early data assessment and filtering data prior collection accelerates the review of data.

Understanding what you were going to receive before you pull it out of the cloud, and 3) Cellebrite’s Endpoint Inspector is the industry leading solution for remote collection, true remote collection. No hardware is being shipped, is by email only, and data is sent back to the examiner for mobile and computer and now includes cloud workplace apps. Thank you for joining me today for this webinar. Are there any questions?

Gail: Yes, Monica, we do have some questions that were submitted during this conversation. The first one is: how are modern attachments handled during cloud workplace collection from the Endpoint Inspector tool?

Monica: That is a great question. You know, I was just at Relativity Fest last week, which was absolutely phenomenal. Relativity did a fantastic job for our first time at Fest in two years. I had the great fortune to be on a panel with Susan Stone, with Jerry Bowie and with Nikolai P (I’m not going to destroy your last name, I apologize).

But we were talking about normalizing modern data. So, when we were talking about modern data, it was all of the sources in which Endpoint Inspector can collect from, and there was about a 15 to 20 minute heated debate about modern attachments. And so someone said something during that conversation that I’m gonna share with this audience and I’m gonna use to answer this question: why are we calling them attachments?

Why are we calling them attachments? So, I assume, for this question (and let me know if I’m not correct), that when we’re talking about modern attachments, we’re talking about the links that you can find in your email, in your Slack or even your Teams. So, it is not the traditional attachment, the parent email with the child attachment, which is actually in the email, and so during processing you can pull one out of the other.

There is no family relationship there because the attachment is actually not present. What’s present is a link to the attachment. And that link could be in OneDrive, that link could be in SharePoint, that link could be a Box. Where that link is not is in the parent document or the parent attachment. So, my first question is why are we calling them attachments? I think in E-discovery, particularly in collections and processing, and now that there is more of a forensics angle when we’re looking at gathering this data, the integrity of the data is important.

So, there is a question overall that I’m very interested to see the industry tackle about how we preserve the integrity of the data, because when we pull that attachment back from the collaborative storage location and put it in the message or put it in the chat, are we not disrupting the integrity of the data? Because that document wasn’t there before, a reference to it was.

Why are we calling them attachments? That is my question for the industry and if you have comments on that, please feel free to leave them in the comment box. But for Cellebrite Endpoint Inspector collection of cloud workplace apps, specifically the modern attachment are reference and the metadata. Great question.

Gail: The next question, it also relates to what we were just talking about. How are documents from collaborative storage locations like SharePoint and Box handled when they’re not attachments?

Monica: Hmm, that is a great question. I think this is something that Jerry Buoy had also brought up during our conversation. So, the idea here is that when you are pulling, say, from the SharePoint or from a Box, very much like the question is posing, that there could be collaborative spaces.

So, within SharePoint, within Box we can set up personal spaces and so therefore we are the custodian, but then there’s also, let’s say, team or department locations. And so it’s not that an individual has access to that document, it’s that a group of people have access to that document, and therefore we need to determine who the custodian is. Like, right now, we are looking at that the same way the remainder of the industry is looking at that in that we are not identifying custodians because there is no true custodian, it is a collaborative document.

Instead what we’re doing is we are identifying the location of the document so that we can understand that not an individual person owns it, but that several people that have access to a particular location had access (or could potentially have access) to that document. So, when it comes to collaborative storage locations, we’re not looking at custodianship, we are looking at access or we’re providing insight into who may have access by providing the location of the document. That’s a great question.

Gail: And next question, just to shift back to workplace apps for a moment: how are cloud workplace apps licensed in Endpoint Inspector?

Monica: That’s a good question. How are cloud workplace apps licensed in Endpoint Inspector? I’m going to take my own spin on this question. So, I know that we talked about Outlook, we talked about SharePoint, we talked about FileShare, and a few others.

So, I think I’m being asked whether or not you are looking at these connectors individually. But at Cellebrite we take a holistic approach or view of cloud workplace app collection. So, when you are looking to use these connectors in the application, all of the connectors are available to you. It is not tiered in terms of, you know, you can collect from Office, or you can collect from Google Workplace. You can collect from any or all of the connectors that I shared with this audience today.

Gail: Great, that’s…I think that clarified the question. If we did not, please let us know in the comment. Next question we have is: how many connectors can be collected from in one collection job?

Monica: Oh, that’s a great question. Well I’ll give you an E-discovery answer, which is, it depends! It depends on the bandwidth that you can have and how much time that you have for the collection. So, as I shared before, all of the connectors that we shared with you today in this webinar is available if you are working with cloud workplace app collection in Endpoint Inspector.

So, in that case we are looking at upwards of five, six connectors that could be going at any one time. Theoretically you could run all of them, but whether or not you would want to collect that much data at that time is the question! So, I think it’s really something to be determined based on the resources of your IT’s infrastructure, and then also the amount of time that you have to get to the data.

Gail: Thanks for answering that. That’s all the questions we have time for today. So, I’m gonna thank Monica for your insightful and useful information. I think you’ve given us a lot to think about in regards to workplace apps and how to gather that information and data. We’ve had a few more questions come in that we didn’t have time to go through or answer throughout the presentation, but we’ll reach out to each of you individually after the webinar to answer those.

To learn about how you can get started with any of our solutions, please reach out to us via the “contact us” button on your console. After the webinar today, you’ll see a prompt asking for some feedback on what topics you’d like to see in our future webinars. If you have a minute, please fill that out and help us decide what you’d like to see covered in a future webinar. Thanks again, Monica, for joining us today. Have a great day!

Tips And Tricks: Data Collection For Cloud Workplace Applications

Get The Latest DFIR News

Leave a Comment Cancel reply

Forensic Focus

Investigating Video: The Vital First Steps

Forensic Focus Digest, May 10 2024

Oxygen Forensic® KeyDiver

Detego Global Announces Webinar To Demonstrate The Powerful New Features Of Detego v4.16

Digital Forensics Round-Up, May 08 2024

UK Parliamentary Legislation Introduced Against Deepfakes