google is going to town indexing all the concrete directories

Permalink
they are indexing files like:

concrete/blocks/form/db.xml
and :
concrete/js/tiny_mce_309/plugins/table/merge_cells.htm

Is there any way to stop this via .htaccess? I know how to stop them from indexing via robots.txt, but shouldn't it be done via .htaccess if possible?

 
jgarcia replied on at Permalink Reply
jgarcia
I could be wrong, but I'm thinking the only way to do this is probably via robots.txt. Don't even think it would be possible via .htaccess, since as far as i know, the spiders do not (and probably cannot) read the .htaccess file. You just need to deny the concrete directory in robots.txt and that will fix it...as soon as google spiders the site again.

It seems odd that it would be indexing those files...as you would have to have linked to them from somewhere in order for google to find them...
wesyah234 replied on at Permalink Reply
I disallowed dir indexing:
Options -indexes

in the .htaccess

also added the /concrete directory to the robots.txt disallow statement...

so combined, these should fix my issue.
LucasAnderson replied on at Permalink Reply
LucasAnderson
http://www.concrete5.org/index.php?cID=3621
wesyah234 replied on at Permalink Reply
I looked at my other CMS installations that I've done, and noticed that they have included a robots.txt in the distribution, with exclusions for all the "system" directories the CMS used...

Just suggesting that maybe Concrete 5 could include one of these in the distribution as well:)
Tony replied on at Permalink Reply
Tony
I tend to drop a one line .htaccess in any folder tree that I want to deny access to, which contains this line:

deny from all
freestylemovement replied on at Permalink Reply
freestylemovement
could you plut this in context?

i'm new at .htaccess type stuff, and would like to see this line with some sample page-tree directory.. please?