Had an interesting request from a client last week, asking if there is an easy way to save all of the web pages that related to a particular Amazon Market Place Trader. Mulling things over at the moment but just wanted to check if anyone else had dealt with this challenge?
Greetings,
Three possible approaches
1) You can download them all with a spider or similar program - wget and htget are my favorites.
2) If there is a lot of active content, you could use Adobe Acrobat Pro to walk the site and turn each page into a PDF.
3) If the site requires a lot of navigation, you could set up a video capture application and capture the user experience as you walk through the site.
-David
Thanks for your ideas.
Yes, I have considered all of these. The automated approach is what I am looking at due to the number of pages but I can't see, at the moment, how a spider application will be clever enough to filter just the pages that relate to the particualr trader and not go down links which do not relate to the investigation. I have wondered if there was a spider utility that seached the HTML and only saved it if there was reference to a specific keyword (obviously, in this case, it would be the name of the trader)
I have wondered if there was a spider utility that seached the HTML and only saved it if there was reference to a specific keyword (obviously, in this case, it would be the name of the trader)
Have you looked at WinHTTrack? Its been a while since I used it, but I know it has a lot of options, and I have a feeling one might be the keyword search you mention. The only downside to this I see is the length of time it might take to spider something as big as Amazon.
Another option, and I am not sure how this would go, would be to look into something like Maltego - http//
Maltego is an open source intelligence and forensics application. It will offer you timous mining and gathering of information as well as the representation of this information in a easy to understand format. Coupled with its graphing libraries, Maltego allows you to identify key relationships between information and identify previously unknown relationships between them.