Benjamin Fung, Associate Professor, McGill University

Benjamin, you're an Associate Professor of Information Studies at McGill University – can you tell us more about the role and how you entered academia?

Certainly. As you say, I am currently an Associate Professor of Information Studies at McGill University and previously was an Associate Professor of Information Systems Engineering at Concordia University. I am particularly interested in developing new, scalable data mining methods for privacy protection and crime investigation.

In 2003, after working in the software industry for four years, I noticed there was a need for scalable data mining methods. As a result, I resigned from my job at SAP Business Objects and studied a Ph.D. in computing science, specializing in data mining, at Simon Fraser University. Recently, there is a hot research topic called “big data”, but data miners have been working on “big data” for more than 20 years already.Your research focuses on designing intelligent systems for the purpose of crime investigation. How did you become interested in these topics?

After joining the Computer Security team at Concordia in 2007, I had a lot of opportunities to interact with different law enforcement units in Canada. In the meetings, I found that there is a big gap between the state-of-the-art data mining methods in the literature and the current software tools used by law enforcement officers. A lot of important evidence can be collected from the suspects’ digital devices, from laptops to smart phones. The challenge is how to efficiently retrieve the relevant information from such a large volume of (unstructured) textual data.

Your paper "Subject-based semantic document clustering for digital forensic investigations" has recently been published in Data & Knowledge Engineering. What prompted this line of research?


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.


Unsubscribe any time. We respect your privacy - read our privacy policy.

We had a collaborative research project with Sret du Qubec, the Quebec Provincial Police. In the meeting, we found that the law enforcement officers were using a traditional keyword-based approach to search the documents from the confiscated computers. The quality of the retrieved results were pretty much based on the experience of the investigator. We were confident that we could do a much better job than that.

Can you give us a brief overview of the paper and the conclusions you reach?

We assume that the investigator has access to a large volume of textual data collected from a confiscated computer. The textual data can be the entire hard drive in a computer, e-mails, chat log messages, etc. The traditional approach is to build an index of the documents, and let the user enter some key terms for searching. However, this approach does not work well for crime investigation. For example, a drug dealer will never use the word “drugs” in his e-mails. When an investigator searches for “drugs”, she is indeed searching for the documents (emails) that are related to the topic “drugs”. An e-mail may be related to drugs even though the word “drugs” may not appear in it. This is the motivation of developing a subject-based search engine for crime investigation.

The scientific contribution we made in this paper is to capture the vocabularies of the suspect, and then use the suspect’s vocabularies to search against his own documents. Experiments show that this approach is much more effective than the traditional keyword-based approach, in which the search terms were soley provided by the the investigator.

It seems like your semantic document clustering technique may also be useful in artificial intelligence, particularly the word sense disambiguation algorithm. Is this something you've considered as a future application of your research?

Actually, it is the other way around! We are utilizing word sense disambiguation algorithms in our method. We probably won’t make contributions in word sense disambiguation in our future research, but we will utilize the algorithms from this area.

You've developed a Cyber Forensic Search Engine for indexing, clustering and search. Have you seen any cases of it being used in investigations so far?

We conducted our experiments on real-life criminal data, and the results are encouraging. Recently, we shared our software program and source code with law enforcement units, but I am not sure if they have utilized our tools in any case yet.

What do you think the next major developments will be in digital forensic examination? Which areas need to be explored further?

Social media, such as Twitter and Facebook, will be a valuable source of information for crime investigation. Yet, the general public has raised serious concerns in the privacy aspect. We need to seek a balance between privacy protection and effective data mining. This brings us to my another research direction on privacy-preserving data mining, which aims at achieving effective data mining without compromising individual privacy. I have demonstrated some success in privacy-preserving data mining in the healthcare sector. It will be interesting if similar techniques can be applied to crime investigation.

What do you do in your spare time?

I have been interviewed many times. You are the first one asking this question. 🙂

My time is consumed by research, teaching, and my two kids. If I still have space time, I enjoy reading about astrophysics and the history of different religions.

Dr. Benjamin Fung is an Associate Professor of Information Studies (SIS) at McGill University, an Affiliate Associate Professor of Information Systems Engineering (CIISE) at Concordia University, and a Research Scientist of the National Cyber-Forensics and Training Alliance Canada (NCFTA Canada). He received a Ph.D. degree in computing science from Simon Fraser University in 2007.

Leave a Comment

Latest Videos

In this episode of the Forensic Focus podcast, Desi and Si discuss different online programming courses and what they think about the popular platform, Udemy. They also talk about Flipper, Dev boards, and Raspberry Pi, and delve into the fascinating phenomenon of running the classic game Doom on unlikely devices.

Throughout the episode, Desi and Si share their digital forensics expertise, referencing some of the cases they have been working on and highlighting particular methodologies and technologies that have an impact on cybersecurity.

Show Notes:

100 Days of Code: The Complete Python Pro Bootcamp for 2023 - https://www.udemy.com/course/100-days-of-code/

Domestika - https://www.domestika.org/en

MIT OpenCourseWare - https://www.youtube.com/@mitocw 

MasterClass - https://www.masterclass.com/

Raspberry Pi 400 Complete Kit - https://core-electronics.com.au/raspberry-pi-400-kit.html

Flipper Discord - https://discord.com/invite/flipper

Flipper Zero - https://flipperzero.one/

This Programmer Figured Out How to Play Doom on a Pregnancy Test - https://www.popularmechanics.com/science/a33957256/this-programmer-figured-out-how-to-play-doom-on-a-pregnancy-test/

Here’s a dude playing Doom Eternal on his fridge - https://www.polygon.com/2020/10/13/21514933/doom-eternal-refrigerator-door-samsung-smart-refrigerator-xbox-game-pass-richard-mallard

Doom hacker gets Doom running in Doom - https://www.pcgamer.com/doom-hacker-gets-doom-running-in-doom/

Doom Running On A Calculator Powered By Old Potatoes - https://kotaku.com/doom-running-on-a-calculator-powered-by-old-potatoes-1845374069

GoldenEra - https://www.imdb.com/title/tt11753760/

Racing the Beam - https://en.wikipedia.org/wiki/Racing_the_Beam

High Score (TV series) - https://en.wikipedia.org/wiki/High_Score_(TV_series)

Microcontroller Courses (Udemy) - https://www.udemy.com/topic/microcontroller/

The story of Final Fantasy XIV’s renegade do-good modders - https://www.pcgamesn.com/final-fantasy-xiv/ffxiv-modders-renegade-do-gooders

Logical fallacies - https://yourlogicalfallacyis.com/

In this episode of the Forensic Focus podcast, Desi and Si discuss different online programming courses and what they think about the popular platform, Udemy. They also talk about Flipper, Dev boards, and Raspberry Pi, and delve into the fascinating phenomenon of running the classic game Doom on unlikely devices.

Throughout the episode, Desi and Si share their digital forensics expertise, referencing some of the cases they have been working on and highlighting particular methodologies and technologies that have an impact on cybersecurity.

Show Notes:

100 Days of Code: The Complete Python Pro Bootcamp for 2023 - https://www.udemy.com/course/100-days-of-code/

Domestika - https://www.domestika.org/en

MIT OpenCourseWare - https://www.youtube.com/@mitocw

MasterClass - https://www.masterclass.com/

Raspberry Pi 400 Complete Kit - https://core-electronics.com.au/raspberry-pi-400-kit.html

Flipper Discord - https://discord.com/invite/flipper

Flipper Zero - https://flipperzero.one/

This Programmer Figured Out How to Play Doom on a Pregnancy Test - https://www.popularmechanics.com/science/a33957256/this-programmer-figured-out-how-to-play-doom-on-a-pregnancy-test/

Here’s a dude playing Doom Eternal on his fridge - https://www.polygon.com/2020/10/13/21514933/doom-eternal-refrigerator-door-samsung-smart-refrigerator-xbox-game-pass-richard-mallard

Doom hacker gets Doom running in Doom - https://www.pcgamer.com/doom-hacker-gets-doom-running-in-doom/

Doom Running On A Calculator Powered By Old Potatoes - https://kotaku.com/doom-running-on-a-calculator-powered-by-old-potatoes-1845374069

GoldenEra - https://www.imdb.com/title/tt11753760/

Racing the Beam - https://en.wikipedia.org/wiki/Racing_the_Beam

High Score (TV series) - https://en.wikipedia.org/wiki/High_Score_(TV_series)

Microcontroller Courses (Udemy) - https://www.udemy.com/topic/microcontroller/

The story of Final Fantasy XIV’s renegade do-good modders - https://www.pcgamesn.com/final-fantasy-xiv/ffxiv-modders-renegade-do-gooders

Logical fallacies - https://yourlogicalfallacyis.com/

YouTube Video UCQajlJPesqmyWJDN52AZI4Q_5f72B6DD5wk

Programming Languages, Flipper And Gaming

Forensic Focus 24th May 2023 11:43 am

In this episode of the Forensic Focus podcast, Si and Desi talk to Mackenzie Jackson, Developer Advocate at Git Guardian. 

Mackenzie discusses the problem of hard-coded and leaked credentials in Git repositories, the task of scanning Git repositories for leaked credentials, and how that’s helped by the setup of GitHub and Git. 

He also looks at some public and private cases of security breaches through Git repositories and recommends tools you can use to combat attackers on Git. 

Show Notes:

Toyota Suffered a Data Breach by Accidentally Exposing A Secret Key Publicly On GitHub (GitGuardian) - https://blog.gitguardian.com/toyota-accidently-exposed-a-secret-key-publicly-on-github-for-five-years/

GitHub.com rotates its exposed private SSH key (Bleeping Computer) - https://www.bleepingcomputer.com/news/security/githubcom-rotates-its-exposed-private-ssh-key/

Conpago - https://www.conpago.com.au/

Source Code as a Vulnerability - A Deep Dive into the Real Security Threats From the Twitch Leak (GitGuardian) - https://blog.gitguardian.com/security-threats-from-the-twitch-leak/

Teenagers Leveraging Insider Threats: Lapsus$ Hacker Group (Forbes) - https://www.forbes.com/sites/emilsayegh/2023/03/15/teenagers-leveraging-insider-threats-lapsus-hacker-group

Lapsus$: Oxford teen accused of being multi-millionaire cyber-criminal (BBC) - https://www.bbc.co.uk/news/technology-60864283

Dynamic Secrets (HashiCorp) - https://developer.hashicorp.com/vault/tutorials/getting-started/getting-started-dynamic-secrets

Crappy code, crappy Copilot. GitHub Copilot is writing vulnerable code and it could be your fault (GitGuardian) - https://blog.gitguardian.com/crappy-code-crappy-copilot/

trufflesecurity/trufflehog (GitHub) - https://github.com/trufflesecurity/trufflehog

gitleaks/gitleaks (GitHub) - https://github.com/gitleaks/gitleaks

Git (Wikipedia) - https://en.wikipedia.org/wiki/Git

awslabs/git-secrets (GitHub) - https://github.com/awslabs/git-secrets

In this episode of the Forensic Focus podcast, Si and Desi talk to Mackenzie Jackson, Developer Advocate at Git Guardian.

Mackenzie discusses the problem of hard-coded and leaked credentials in Git repositories, the task of scanning Git repositories for leaked credentials, and how that’s helped by the setup of GitHub and Git.

He also looks at some public and private cases of security breaches through Git repositories and recommends tools you can use to combat attackers on Git.

Show Notes:

Toyota Suffered a Data Breach by Accidentally Exposing A Secret Key Publicly On GitHub (GitGuardian) - https://blog.gitguardian.com/toyota-accidently-exposed-a-secret-key-publicly-on-github-for-five-years/

GitHub.com rotates its exposed private SSH key (Bleeping Computer) - https://www.bleepingcomputer.com/news/security/githubcom-rotates-its-exposed-private-ssh-key/

Conpago - https://www.conpago.com.au/

Source Code as a Vulnerability - A Deep Dive into the Real Security Threats From the Twitch Leak (GitGuardian) - https://blog.gitguardian.com/security-threats-from-the-twitch-leak/

Teenagers Leveraging Insider Threats: Lapsus$ Hacker Group (Forbes) - https://www.forbes.com/sites/emilsayegh/2023/03/15/teenagers-leveraging-insider-threats-lapsus-hacker-group

Lapsus$: Oxford teen accused of being multi-millionaire cyber-criminal (BBC) - https://www.bbc.co.uk/news/technology-60864283

Dynamic Secrets (HashiCorp) - https://developer.hashicorp.com/vault/tutorials/getting-started/getting-started-dynamic-secrets

Crappy code, crappy Copilot. GitHub Copilot is writing vulnerable code and it could be your fault (GitGuardian) - https://blog.gitguardian.com/crappy-code-crappy-copilot/

trufflesecurity/trufflehog (GitHub) - https://github.com/trufflesecurity/trufflehog

gitleaks/gitleaks (GitHub) - https://github.com/gitleaks/gitleaks

Git (Wikipedia) - https://en.wikipedia.org/wiki/Git

awslabs/git-secrets (GitHub) - https://github.com/awslabs/git-secrets

YouTube Video UCQajlJPesqmyWJDN52AZI4Q_BX15Z_xF8mA

Preventing Data Leaks With Git Guardian

Forensic Focus 3rd May 2023 11:07 am

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feed settings page to add an API key after following these instructions.

Latest Articles

Share to...