Hello all,
I have a formatting question for anyone familiar with IIS logs. I am looking at some IIS logs on an intrusion and have numerous entries for Googlebot, such as the one shown below
2013-06-07 222806 ... GET / yjd=152-Redacted-Redacted-Redacted 80 - 66.249.73.232 Mozilla/5.0+(compatible;+Googlebot/2.1;
I've removed the IP address and redacted the three words because they are pornographic in nature….
My question is can anyone tell me what "yjd=152" represents or send me some links to some material that might shed some light on this? Thanks!
… and have numerous entries for Googlebot, such as the one shown below
2013-06-07 222806 ... GET / yjd=152-Redacted-Redacted-Redacted 80 - 66.249.73.232 Mozilla/5.0+(compatible;+Googlebot/2.1;
There seems to be a lot missing – at least if I assume the logging is default. I see no HTTP status, for example. I would check IIS log configuration so that I knew exactly what each field corresponds to instead of having to guess.
But it looks very much like a legitimate Googlebot log entry, so I'm assuming it's genuine. (The client IP (?) makes sense, and the User Agent (?) looks like one of the documented ones over here https // developers . google . com/webmasters/control-crawl-index/docs/crawlers .)
My question is can anyone tell me what "yjd=152" represents or send me some links to some material that might shed some light on this?
Googlebot is mainly a web spider – if it makes a request to a server it is most likely because somewhere there's a link containing that information. It could be on the site itself – or it could be somewhere else.
There are also ways of priming Googlebot. For example, a webmaster can upload or register a Sitemap. This adds links that the webmaster wants Google to visit and index, and it also helps to speed up the indexing a bit. I think one way of doing this is by having a Sitemap-file on the website – if there is one in your case, check that it isn't filled with garbage. (A webmaster would probably know more about this.)
There may also still be ways of manually requesting Google to index a particular URL – in which case it could be from such requests. Or even submitting them through some kind of API.
But it's not until you identify the code that gets executed (or got executed) that you can decide what yjd, etc. actually means.
If you have very many of these requests, you may want to keep the possibility in mind that any intruder is trying to fill any IIS logs with garbage to make analysis more difficult. Particularly if the HTTP Status code (which I don't see in the sample entry you provided) indicates that the request could not be fulfilled, say, because there is no such page. (Added In that case, the URLs must have been produced in some way – they could have been collected from another web server, for example. )
Thanks for the reply! Sorry about the incomplete entry…..I thought I posted the entire entry.
The requests are returning 200s, indicating that the URLs were found.
Can a webmaster upload a Sitemap to Google, requesting the terms be indexed?
Thanks again!
Thanks for the reply! The requests are returning 200s, indicating that the URLs were found.
In that case, I'd go for the web page or script file that's the target for that particular URL, and examine it closer. Which physical folder that is in depends on the configuration, so you may need to have access to the configuration data, unless it's a standard, default set-up.
Can a webmaster upload a Sitemap to Google, requesting the terms be indexed?
Best reply to that question is probably given by Google – search for Webmaster Tools for the main entry page or together with Submitting Sitmaps for the specific one (or ones – I find a couple of them).
In general, also check sitemaps.org – Google is not the only web indexer that uses Sitemaps. But in your case, concerning Googlebot log entries, this is probably not of primary interest.