Acquisition of web ...
 
Notifications
Clear all

Acquisition of web site content

11 Posts
11 Users
0 Likes
1,473 Views
(@liguoroa)
Posts: 43
Estimable Member
Topic starter
 

Dear All,
I need to acquire the content of a web site and verify if it contains
words related to my client.

To perform this tasks I would use the tool wget, and search into web page the a set of words using grep and rgrep.
Which tools do you suggest to analyze metadata of picture, pdf and other document?

Any suggestion will be appreciate…

Best Regards
Andrea Liguoro

 
Posted : 09/11/2013 12:26 am
keydet89
(@keydet89)
Posts: 3568
Famed Member
 

Which tools do you suggest to analyze metadata of picture, pdf and other document?

Depends on the file in question…EXIFTool is a good option that is also scriptable.

 
Posted : 09/11/2013 12:54 am
(@questnz)
Posts: 34
Eminent Member
 

You can copy web site using HT Track

 
Posted : 09/11/2013 6:43 pm
EricZimmerman
(@ericzimmerman)
Posts: 222
Estimable Member
 

another vote for HTTrack

 
Posted : 11/11/2013 2:48 am
(@tmlambert13)
Posts: 2
New Member
 

From what I've seen, Irfanview does pretty decently with image files. I think to view EXIF data you have to download a plugin from their site to go with the software.

 
Posted : 11/11/2013 9:11 am
(@belkasoft)
Posts: 169
Estimable Member
 

Our tool (see my signature) supports full-text search among all acquired evidence. Regular expressions are also supported.

 
Posted : 11/11/2013 3:35 pm
(@zavattari)
Posts: 1
New Member
 

As already said, You can copy web site using HT Track but for forensics purpose is useless.
HT Track modify the source content of the page.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

Matteo Zavattari

 
Posted : 16/11/2013 1:56 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

As already said, You can copy web site using HT Track but for forensics purpose is useless.
HT Track modify the source content of the page.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

Matteo Zavattari

Interesting software. )

If I may, providing it's license in English might widen the target of intereseted users.

The current Italian one has also this IMHO "queer" limitation

- divulgare gli esiti di qualsiasi prova comparativa del software a terzi senza l’approvazione scritta dei PROPRIETARI;

(rough translation "you cannot divulge the results of comparative tests of this software without written approval of the PROPRIETORS")
basically if I try it and find it better (or faster, or whatever) than another software I cannot talk about it? 😯

jaclaz

 
Posted : 16/11/2013 4:18 pm
(@jlindmar)
Posts: 30
Eminent Member
 

liguoroa,

I would acquire the website using a few different options, e.g. wget, HTTrack, FAW, etc. and then compare the results to determine which one gives you the most complete/accurate results. Assuming you do not have access to any traditional digital forensic analysis tools, e.g. X-Ways Forensic, EnCase, FTK, etc., I would take a look at Nuix's Proof Finder

http//www.prooffinder.com/

This tool will allow you to accomplish everything you noted in your post.

Regards,

Jesse

 
Posted : 18/11/2013 8:37 pm
(@jonathan)
Posts: 878
Prominent Member
 

As already said, You can copy web site using HT Track but for forensics purpose is useless.
HT Track modify the source content of the page.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

Matteo Zavattari

I disagree; modification of original evidence, while not ideal and which should be avoided if possible, does not make the evidence forensically useless. See Principle 2 of the ACPO Good Practice Guide for Digital Evidence.

If you find a content copied, you should use FAW (http//www.fawproject.com/en/default.aspx)
FAW is the first browser conceived to acquire web pages for forensic purposes from any web site available on the internet.

If you're behind this project it's probably a courtesy to the OP and other readers that you state this.

Alongside HTTrack (and FAW) there is another free tool, Web Page Saver, from Magnet Forensics.

 
Posted : 18/11/2013 9:28 pm
Page 1 / 2
Share: