Notifications

Clear all

Preserving a copy of a website

Page 1 / 2 Next

General (Technical, Procedural, Software, Hardware etc.)

Last Post by cirillop 18 years ago

13 Posts

6 Users

0 Likes

1,353 Views

RSS

3rugger

(@3rugger)

Posts: 4

New Member

Topic starter

Does anyone have any recomendations for software and procedures for preserving a copy of a website? We need to be able to show what content was available on a website at a particular time. Thanks in advance for any replies.

Posted : 18/10/2006 8:19 pm

skip

(@skip)

Posts: 57

Trusted Member

Does anyone have any recomendations for software and procedures for preserving a copy of a website? We need to be able to show what content was available on a website at a particular time. Thanks in advance for any replies.

Remote?
Um, the nature of your question suggests that you don't have your hands on the hard drive.

Perhaps chached pages on client systems that accessed the site.
Proxy, virtual hosts, dynamic content are all terms that may be interesting to you as well.

You can download an entire site with
wget -r url.com

maybe google for "spiders"

Skip

Posted : 18/10/2006 8:28 pm

dcso

(@dcso)

Posts: 31

Eminent Member

I've found httrack to work well for saving websites.

www.httrack.com

Posted : 18/10/2006 9:13 pm

hogfly

(@hogfly)

Posts: 287

Reputable Member

httrack works well.

I wonder..would archive.org be useful?

Posted : 18/10/2006 9:47 pm

3rugger

(@3rugger)

Posts: 4

New Member

Topic starter

To clarify, we are interested in copying the files remotely. The difficulty is that the website makes heavy use of flash and we need to make sure the entire site is copied. A log file of some sort indicating when and from where any copied files originated would also be useful.

Posted : 19/10/2006 2:56 am

cosimo

(@cosimo)

Posts: 20

Eminent Member

Besides being able to copy everything from a website, and to log when the copy has been taken, how can you prove that you did not actually forge the contents, the logs, or both? After all, you don't have a static copy of the web server content taken at the dates registered in the log.
I've been thinking about this problem for a while, but I didn't come out with any satisfactory solution.

– Cosimo

Posted : 19/10/2006 8:13 pm

skip

(@skip)

Posts: 57

Trusted Member

Besides being able to copy everything from a website, and to log when the copy has been taken, how can you prove that you did not actually forge the contents, the logs, or both? After all, you don't have a static copy of the web server content taken at the dates registered in the log.
I've been thinking about this problem for a while, but I didn't come out with any satisfactory solution.

– Cosimo

I think that is why my gut went to cached content on PC's that had accessed the website….I guess my reflex was to go there because you could get your hands on a static copy.

Static copy….I suppose that is what your real queston should be.
Because you can't preserve a website remotely (to a degree).

skip

Posted : 19/10/2006 11:47 pm

cosimo

(@cosimo)

Posts: 20

Eminent Member

Static copy….I suppose that is what your real queston should be.
Because you can't preserve a website remotely (to a degree).

skip

Yes, I agree, the key is to find a way to obtain a time-stamped "snapshot" of the site, possibly taken by a third-party entity and stored in a place that you cannot access directly (otherwise, your opponent migth claim that also the cached copy has been forged).
For instance, what about using the Google cached copies if the web site has been indexed by Google, very frequently they store a cached copy on their servers, and when accessing it you also get the time stamp of its collection.

– Cosimo

Posted : 20/10/2006 4:02 pm

dcso

(@dcso)

Posts: 31

Eminent Member

While not the cleanest way to do it, what about visiting the site while making a video capture of your screen using a program like Camtasia? You could associate the time and date with the video file which would be harder to alter than a cached copy.

Is that thinking outside the box or in a small padded room?

Posted : 20/10/2006 5:33 pm

3rugger

(@3rugger)

Posts: 4

New Member

Topic starter

Google cache is an interesting idea but for our purposes (litigation) would probably not remain active long enough. We have referred to archive.org in the past and converted pages from it to PDF to show that certain content was once on a website. The camtasia idea is interesting. Using a copy of local cache is also an interesting idea but for an expansive website it could be quite cumbersme to make sure you access every page linked on the site. Getting a copy off the server itself is of course ideal, however getting access to that server early in the litigation process is difficult.

Thanks for the ideas and responses!

Posted : 20/10/2006 6:29 pm

Page 1 / 2 Next

8 Forums
15.5 K Topics
92 K Posts
12 Online
40.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed