Google monitoring the web

During the surfing I reached at Nick Carr’s most recent article (Google preparing to police web). According to his article Google plan is to use new software which automatically identify compromised web pages in its database and label them as “potentially harmful” in its search results.

He has used the reference of Google engineers’ written papers on the subject, The Ghost in the Browser, where Google explains how Google is preparing to respond to the threat by incorporating an automated security analysis into its routine spidering and indexing of sites:

To address this problem and to protect users from being infected while browsing the web, we have started an effort to identify all web pages on the Internet that could potentially be malicious. Google already crawls billions of web pages on the Internet. We apply simple heuristics to the crawled pages repository to determine which pages attempt to exploit web browsers. The heuristics reduce the number of URLs we subject to further processing significantly. The pages classified as potentially malicious are used as input to instrumented browser instances running under virtual machines. Our goal is to observe the malware behavior when visiting malicious URLs and discover if malware binaries are being downloaded as a result of visiting a URL. Web sites that have been identified as malicious, using our verification procedure, are labeled as potentially harmful when returned as a search result. Marking pages with a label allows users to avoid exposure to such sites and results in fewer users being infected.

As far as my concerns, it is good if Google remove these type of sites from their index completely and re-include after clean up. That would take care of the intentionally malicious sites, because Google has good share in search traffic.