Preserving a copy o...
 
Notifications
Clear all

Preserving a copy of a website

13 Posts
6 Users
0 Likes
1,353 Views
(@3rugger)
Posts: 4
New Member
Topic starter
 

Does anyone have any recomendations for software and procedures for preserving a copy of a website? We need to be able to show what content was available on a website at a particular time. Thanks in advance for any replies.

 
Posted : 18/10/2006 8:19 pm
skip
 skip
(@skip)
Posts: 57
Trusted Member
 

Does anyone have any recomendations for software and procedures for preserving a copy of a website? We need to be able to show what content was available on a website at a particular time. Thanks in advance for any replies.

Remote?
Um, the nature of your question suggests that you don't have your hands on the hard drive.

Perhaps chached pages on client systems that accessed the site.
Proxy, virtual hosts, dynamic content are all terms that may be interesting to you as well.

You can download an entire site with
wget -r url.com

maybe google for "spiders"

Skip

 
Posted : 18/10/2006 8:28 pm
 dcso
(@dcso)
Posts: 31
Eminent Member
 

I've found httrack to work well for saving websites.

www.httrack.com

 
Posted : 18/10/2006 9:13 pm
hogfly
(@hogfly)
Posts: 287
Reputable Member
 

httrack works well.

I wonder..would archive.org be useful?

 
Posted : 18/10/2006 9:47 pm
(@3rugger)
Posts: 4
New Member
Topic starter
 

To clarify, we are interested in copying the files remotely. The difficulty is that the website makes heavy use of flash and we need to make sure the entire site is copied. A log file of some sort indicating when and from where any copied files originated would also be useful.

 
Posted : 19/10/2006 2:56 am
(@cosimo)
Posts: 20
Eminent Member
 

Besides being able to copy everything from a website, and to log when the copy has been taken, how can you prove that you did not actually forge the contents, the logs, or both? After all, you don't have a static copy of the web server content taken at the dates registered in the log.
I've been thinking about this problem for a while, but I didn't come out with any satisfactory solution.

– Cosimo

 
Posted : 19/10/2006 8:13 pm
skip
 skip
(@skip)
Posts: 57
Trusted Member
 

Besides being able to copy everything from a website, and to log when the copy has been taken, how can you prove that you did not actually forge the contents, the logs, or both? After all, you don't have a static copy of the web server content taken at the dates registered in the log.
I've been thinking about this problem for a while, but I didn't come out with any satisfactory solution.

– Cosimo

x2

I think that is why my gut went to cached content on PC's that had accessed the website….I guess my reflex was to go there because you could get your hands on a static copy.

Static copy….I suppose that is what your real queston should be.
Because you can't preserve a website remotely (to a degree).

skip

 
Posted : 19/10/2006 11:47 pm
(@cosimo)
Posts: 20
Eminent Member
 

Static copy….I suppose that is what your real queston should be.
Because you can't preserve a website remotely (to a degree).

skip

Yes, I agree, the key is to find a way to obtain a time-stamped "snapshot" of the site, possibly taken by a third-party entity and stored in a place that you cannot access directly (otherwise, your opponent migth claim that also the cached copy has been forged).
For instance, what about using the Google cached copies if the web site has been indexed by Google, very frequently they store a cached copy on their servers, and when accessing it you also get the time stamp of its collection.

– Cosimo

 
Posted : 20/10/2006 4:02 pm
 dcso
(@dcso)
Posts: 31
Eminent Member
 

While not the cleanest way to do it, what about visiting the site while making a video capture of your screen using a program like Camtasia? You could associate the time and date with the video file which would be harder to alter than a cached copy.

Is that thinking outside the box or in a small padded room?

 
Posted : 20/10/2006 5:33 pm
(@3rugger)
Posts: 4
New Member
Topic starter
 

Google cache is an interesting idea but for our purposes (litigation) would probably not remain active long enough. We have referred to archive.org in the past and converted pages from it to PDF to show that certain content was once on a website. The camtasia idea is interesting. Using a copy of local cache is also an interesting idea but for an expansive website it could be quite cumbersme to make sure you access every page linked on the site. Getting a copy off the server itself is of course ideal, however getting access to that server early in the litigation process is difficult.

Thanks for the ideas and responses!

 
Posted : 20/10/2006 6:29 pm
Page 1 / 2
Share: