Join Us!

Extract indexed web...
 
Notifications
Clear all

Extract indexed websites  

  RSS
LeGioN
(@legion)
Junior Member

Hi,

This might be a really dumb question..
But here is the scenario

Somebody creates a webpage.
It gets indexed by google.
It then gets deleted.

The webpage is no longer accessable, but you can still see bits of it through just good ol' fashion googling as it has been indexed.

Is there a way to extract everything that google has indexed?

If this even makes sense )

/LeGioN

Quote
Posted : 25/03/2019 8:25 am
LeGioN
(@legion)
Junior Member

Additional info
Have tried the wayback machine website unsuccesfully as the page needed was not captured.

ReplyQuote
Posted : 25/03/2019 8:47 am
tootypeg
(@tootypeg)
Active Member

not sure i fully understand the scenario. Maybe its still in the browser cache of a suspect? For example, make Chrome work offline and rebuild the page from the cache?

ReplyQuote
Posted : 25/03/2019 9:46 am
jaclaz
(@jaclaz)
Community Legend

As I see it a page (not existing anymore) has EITHER been archived (on wayback machine or on other services) or not.
If not, and if it has been crawled by google (usually it has, since the google crawler is [email protected] efficient) it may be in the cache.
The google cache is temporary only, so you might (or might not) be "on time" to still get it.
Also, unlike archive.org/Wayback Machine the google cache is "last" time google visited it only, so if the page has been - even briefly - replaced by another page, you will find this latter in google cache.

To access easily the google cache you may want to try
http//cachedview.com/

There are other archiving/caching resources, even if they are "tiny" when compared to Google or archive.org, it costs nothing to check if - by sheer luck - something of interest has been cached/archived by them, example
https://www.waybackmachinedownloader.com/blog/alternative-sites-like-archive-org/

A "complete" list is here
https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives
(though most are dedicated to "institutional" websites)

jaclaz

ReplyQuote
Posted : 25/03/2019 10:01 am
LeGioN
(@legion)
Junior Member

As I see it a page (not existing anymore) has EITHER been archived (on wayback machine or on other services) or not.
If not, and if it has been crawled by google (usually it has, since the google crawler is [email protected] efficient) it may be in the cache.
The google cache is temporary only, so you might (or might not) be "on time" to still get it.
Also, unlike archive.org/Wayback Machine the google cache is "last" time google visited it only, so if the page has been - even briefly - replaced by another page, you will find this latter in google cache.

To access easily the google cache you may want to try
http//cachedview.com/

There are other archiving/caching resources, even if they are "tiny" when compared to Google or archive.org, it costs nothing to check if - by sheer luck - something of interest has been cached/archived by them, example
https://www.waybackmachinedownloader.com/blog/alternative-sites-like-archive-org/

A "complete" list is here
https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives
(though most are dedicated to "institutional" websites)

jaclaz

This was the sort of stuff I was hoping you'd show up with!
Tried both cachedview and wayback with not much success, but I am going to give wayback another go.

I had some success with Google Index Retriever by elevenpaths, but I did not quite get me all the good stuff I was hoping to get.

Any my bad tootypeg, I did not specify the fact that there is no physical devices involved. Just a deleted URL. )

/LeGioN

ReplyQuote
Posted : 25/03/2019 10:42 am
Share: