Notifications
Clear all

Website imaging

16 Posts
9 Users
0 Reactions
1,977 Views
(@armresl)
Noble Member
Joined: 21 years ago
Posts: 1011
Topic starter  

Hello,

Looking for your opinions on a good way to get a copy (as best can be had) of a website.

The site contains a lot of Java, Flash, and a few movie files.
No physical access to the computer, server, all done through www. can't FTP into it either, to sum it up (NO Physical access to the box, NO remote access to the box, NO help from the owners of the actual website)

HTT doesn't seem to grab the entire site or even most of it, even after various option tweaks (maybe someone knows a good set of options to grab a site high in java and flash.

Surfoffline pro doesn't capture but maybe 1/3 of the site

Adobe - Walking the site gives also approx 1/4 1/3 of the site

A good like site would be something along the lines of snickers.com where there are links off the main as wells a movies and such.


   
Quote
lucpel
(@lucpel)
Trusted Member
Joined: 14 years ago
Posts: 55
 

In my experience copying websites, 'HTTracker Website copier' is great for gettings files like .js .swf … I did it few days ago, and i got almost 100% of Java S. && Flash files.


   
ReplyQuote
(@armresl)
Noble Member
Joined: 21 years ago
Posts: 1011
Topic starter  

The HTT from my post was meant to address HTTracker.

What setts did you use to get the site you're talking about?

In my experience copying websites, 'HTTracker Website copier' is great for gettings files like .js .swf … I did it few days ago, and i got almost 100% of Java S. && Flash files.


   
ReplyQuote
binarybod
(@binarybod)
Reputable Member
Joined: 17 years ago
Posts: 272
 

I would be really interested in something that can do this. I've had the same results as armresl with wget, HTTrack, gnome-web-photo (via shutter), etc.

In the few cases where I've had to do a snapshot of say, a youtube page then I've ended up creating a pdf from either a screen grab or from printing directly from the web browser. Thankfully to date, I've only had to do one or two pages, capturing a whole site might be a problem…

Downloading the individual resources (like an flv/swf stream for example) is easy - much more difficult is providing the context in which that resource was viewed by the suspect…

Paul


   
ReplyQuote
(@armresl)
Noble Member
Joined: 21 years ago
Posts: 1011
Topic starter  

thanks for the reply Binarybod, I tried to put in the first msg what I had tried and from off channel communications with various people, they have run into the same situation.

A site like snickers gives exactly the type of problem I'm talking about.

I would be really interested in something that can do this. I've had the same results as armresl with wget, HTTrack, gnome-web-photo (via shutter), etc.

In the few cases where I've had to do a snapshot of say, a youtube page then I've ended up creating a pdf from either a screen grab or from printing directly from the web browser. Thankfully to date, I've only had to do one or two pages, capturing a whole site might be a problem…

Downloading the individual resources (like an flv/swf stream for example) is easy - much more difficult is providing the context in which that resource was viewed by the suspect…

Paul


   
ReplyQuote
(@patrick4n6)
Honorable Member
Joined: 16 years ago
Posts: 650
 

Get something like Camtasia, and do a video of walking the relevant sections of the site.


   
ReplyQuote
(@c-r-s)
Estimable Member
Joined: 14 years ago
Posts: 170
 

There are inevitable technical limitations but, in my point of view, Metaproducts does the best job. I'm using Offline Explorer Enterprise for complete mirroring, and Inquiry for batch saving single pages as for documentation of manual online researches.


   
ReplyQuote
lucpel
(@lucpel)
Trusted Member
Joined: 14 years ago
Posts: 55
 

You are right…….I still got some .swf && .js files in the website you mentioned……but i don't have a shockwave player right now to test. Anyway, I got what your post was about…. interesting one, so i'll do my own research and compare results…..

cheers


   
ReplyQuote
(@armresl)
Noble Member
Joined: 21 years ago
Posts: 1011
Topic starter  

Have you pointed the product you mention at a site like the site I mentioned or similiar to see how it works?

There are inevitable technical limitations but, in my point of view, Metaproducts does the best job. I'm using Offline Explorer Enterprise for complete mirroring, and Inquiry for batch saving single pages as for documentation of manual online researches.


   
ReplyQuote
(@c-r-s)
Estimable Member
Joined: 14 years ago
Posts: 170
 

Have you pointed the product you mention at a site like the site I mentioned or similiar to see how it works?

I ran snickers.com "quick and dirty" using default settings Acquisition is no problem at all, getting 240 files, 48 folders at ~96 MB.
Playback on OE's (or any local) webserver is perfect, no difference to the original. Export/presentation is a little tricky for such sites. To avoid any difference in appearance, you have to use the exe viewer which is in fact its own web server. It is no problem at this size but loading executable archives of several 100 MB would take some time.
This site itself is quite an easy task as the project was not configured to follow external links, except pictures.
Real problems are youtube and social media as you don't get a truly functional copy and it is difficult to reasonably restrict the crawling to necessary objects.


   
ReplyQuote
Page 1 / 2
Share: