Website Capture Applications
I was in a discussion today with a fellow colleague about website capture applications. Six years have passed since we last captured a website for evidence. We would be interested to know which website capture apps people use and what was felt to be the best one.
Thank you for your assistance.
Good question I would also be interested to know is there are any 'verifiable' and time-efficient tools for such purposes.
On an aside In the past the amount of time to download an entire site (remotely - 9.5GB on one website recently) has sometimes been an issue so to document the contents/view of a website I have used Camtasia with lost of care talen to show & prove that it is the actual website in question and to open and browse all files and directories. This has helped in persuading non-technical people about what was found, how it was accessed, and that the contents are 'as is' .
for capturing web sites you can use wget in a recursive way, and log the sessions with wireshark, this way you have both the web site content and the server messages suitable for timestamp and non-deniability of the source of the content.
- wget and httrack for mirroring entire sites.
- Adobe Acrobat Pro for turning entire sites into PDFs.
- Screen/video capture for recording sites that use a lot of java/flash.
The first two approaches do not handle java/flash/user interaction well, thus the third option.
As well as having Httrack for live websites, I have found Warrick useful for capturing websites that have been removed by mirroring the archive.org, google and yahoo caches.
Website Ripper/Copier is another tool that is helpful that I have used.
There are quite a few tools that can do this. Although one good online one is GrabzIt's Web Scraper. This web scraper can convert every web page in a website to PDF. They have useful tutorial on how to get started converting an entire website to PDF.
They also provide the ability to add a timestamp watermark to the PDF, which may be useful for some use cases.
If you need a tool that preserve integrity with hash value of extracted websites, you can use hunchly. This tool save every visited web pages as you are browsing in google chrome. You can export the pages visited in MHTML or pdf format. As i said the tool keep every hashing value of the pages consulted and the exported pages. After the investigation hunchly will build you a report of the investigative work you have done. It also enable you to see metadata of the images you would find on the web/social networks.
This is a professional tool used by many law enforcement team and the creator Justin Seitz is easy to join and to talk to.
You should give it a try, there is a trial on their website.
A tool from the United States that I have used in the past to address these issues is made by X1 https://www.x1.com/
A little pricy, but its website capture feature is pretty good and the online artifacts from the tool can be successfully offered as evidence in the US Legal system.
I hope this information will be helpful to you.