Acquisition of web ...
 
Notifications
Clear all

Acquisition of web site content

liguoroa
(@liguoroa)
Junior Member

Dear All,
I need to acquire the content of a web site and verify if it contains
words related to my client.

To perform this tasks I would use the tool wget, and search into web page the a set of words using grep and rgrep.
Which tools do you suggest to analyze metadata of picture, pdf and other document?

Any suggestion will be appreciate…

Best Regards
Andrea Liguoro

Quote
Topic starter Posted : 09/11/2013 12:26 am
keydet89
(@keydet89)
Community Legend

Which tools do you suggest to analyze metadata of picture, pdf and other document?

Depends on the file in question…EXIFTool is a good option that is also scriptable.

ReplyQuote
Posted : 09/11/2013 12:54 am
questnz
(@questnz)
Junior Member

You can copy web site using HT Track

ReplyQuote
Posted : 09/11/2013 6:43 pm
EricZimmerman
(@ericzimmerman)
Active Member

another vote for HTTrack

ReplyQuote
Posted : 11/11/2013 2:48 am
tmlambert13
(@tmlambert13)
New Member

From what I've seen, Irfanview does pretty decently with image files. I think to view EXIF data you have to download a plugin from their site to go with the software.

ReplyQuote
Posted : 11/11/2013 9:11 am
Belkasoft
(@belkasoft)
Active Member

Our tool (see my signature) supports full-text search among all acquired evidence. Regular expressions are also supported.

ReplyQuote
Posted : 11/11/2013 3:35 pm
Zavattari
(@zavattari)
New Member

As already said, You can copy web site using HT Track but for forensics purpose is useless.
HT Track modify the source content of the page.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

Matteo Zavattari

ReplyQuote
Posted : 16/11/2013 1:56 pm
jaclaz
(@jaclaz)
Community Legend

As already said, You can copy web site using HT Track but for forensics purpose is useless.
HT Track modify the source content of the page.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

Matteo Zavattari

Interesting software. )

If I may, providing it's license in English might widen the target of intereseted users.

The current Italian one has also this IMHO "queer" limitation

- divulgare gli esiti di qualsiasi prova comparativa del software a terzi senza l’approvazione scritta dei PROPRIETARI;

(rough translation "you cannot divulge the results of comparative tests of this software without written approval of the PROPRIETORS")
basically if I try it and find it better (or faster, or whatever) than another software I cannot talk about it? 😯

jaclaz

ReplyQuote
Posted : 16/11/2013 4:18 pm
jlindmar
(@jlindmar)
Junior Member

liguoroa,

I would acquire the website using a few different options, e.g. wget, HTTrack, FAW, etc. and then compare the results to determine which one gives you the most complete/accurate results. Assuming you do not have access to any traditional digital forensic analysis tools, e.g. X-Ways Forensic, EnCase, FTK, etc., I would take a look at Nuix's Proof Finder

http//www.prooffinder.com/

This tool will allow you to accomplish everything you noted in your post.

Regards,

Jesse

ReplyQuote
Posted : 18/11/2013 8:37 pm
Jonathan
(@jonathan)
Senior Member

As already said, You can copy web site using HT Track but for forensics purpose is useless.
HT Track modify the source content of the page.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

Matteo Zavattari

I disagree; modification of original evidence, while not ideal and which should be avoided if possible, does not make the evidence forensically useless. See Principle 2 of the ACPO Good Practice Guide for Digital Evidence.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

If you're behind this project it's probably a courtesy to the OP and other readers that you state this.

Alongside HTTrack (and FAW) there is another free tool, Web Page Saver, from Magnet Forensics.

ReplyQuote
Posted : 18/11/2013 9:28 pm
mclarengtr
(@mclarengtr)
New Member

Hi Guys, Need advise if the web is in chinese… i tried using FAW, some of the web did not displayed properly after acquisition. I guess it only support unicode, so for those web using simplified chinese, it was a mess. any advise ?

ReplyQuote
Posted : 25/05/2015 8:42 am
Share:
Share to...