Stopping bots crawling code on a page
Permalink
Is there a way to stop Google crawling all the formatting code and other non-essential content on webpages - this link below shows currently everything is being crawled - puthttp://www.flyfishinginnewzealand.com/... into http://tools.seobook.com/general/spider-test/... and it shows ukao, menu, sf menu all being crawled and diluting the proper keywords and content.
I'm sure there must be a straightforward solution in C5 to make Google only crawl the meta data and the proper content of a page. Thanks you.
I'm sure there must be a straightforward solution in C5 to make Google only crawl the meta data and the proper content of a page. Thanks you.
You have the following in your robots.txt file which tells all of the bots to ignore these directories:
If there is another directory that you do not want the bots to go to, add it to the robots.txt in the same format as the other directories.
User-agent: * Disallow: /blocks Disallow: /concrete Disallow: /config Disallow: /controllers Disallow: /css Disallow: /elements Disallow: /helpers Disallow: /jobs Disallow: /js Disallow: /languages Disallow: /libraries Disallow: /mail Disallow: /models Disallow: /packages
Viewing 15 lines of 18 lines. View entire code block.
If there is another directory that you do not want the bots to go to, add it to the robots.txt in the same format as the other directories.
Google isn't going to use that kind of content for keywords. I think that testing tool is incorrectly including those pieces of text in its report.
So I don't think think you need to worry about it.
Perhaps a more accurate way to 'see' what Google would see is to view your site with either a text based browser, or in something like Chrome turn off css and images (the Web Developer Toolbar is great for this) and check out the text that is left on the page.