File Links Creating Duplicate Content? (possible SEO issue)

Permalink
Hi

I just ran Xenu Link sleuth over a site and noticed that everywhere I had a PDF file linked, the URL is slightly different, even though they all link to the same file.

e.g.
mysite.com/index.php/download_file/view/1/42/
mysite.com/index.php/download_file/view/1/44/
mysite.com/index.php/download_file/view/1/46/
mysite.com/index.php/download_file/view/1/48/

All these pages link to the same file, but I noticed the value at the end matches the c5 page ID.

I'm a little worried this could cause duplicate content (SEO) issues with Google.

Does anyone know how to disable this so that the link is just
mysite.com/index.php/download_file/view/1/
(that is without the page ID)

Otherwise, would you know if this could cause a dupe content issue?

Cheers!

malkau
 
johnpaulb replied on at Permalink Reply
johnpaulb
Hello malkau,

Have you tried clearing the page cache? Maybe the download link was generated and is just cached, clearing it should make the links stop working:
1. Login to your Concrete5 Dashboard.

2. Roll your mouse over the Dashboard button and click the System & Settings option. This willl bring up the System & Settings menu.

3. Under the Optimization section, select Clear Cache to bring up the Clear Cache Menu.

4. Click the Clear Cache button to clear your cache. Once it completes, you will see a notification stating "Cached files removed."

You can also set the cache to clear periodically in the Cache & Speed settings, for example twice a day (every 720 minutes). Here is a link to an article I did on cache and speed settings:
http://www.webhostinghub.com/support/edu/concrete5/get-started/cach...

I hope this is helpful,
John-Paul
malkau replied on at Permalink Reply
malkau
Hi John-Paul

I tried clearing the cache but that didn't change the file link URLs.

I am guess the URLs work like this for tracking purposes... but as I said, am just worried that Google will see this as the same PDF loaded to the site 10 times or something, when it's really the one file with the pageID included at the end of the URL.

Cheers
johnpaulb replied on at Permalink Reply
johnpaulb
Hi again malkau,

If your main concern is Google finding duplicate content, you can use a robots.txt file to limit what pages are crawled, here is a simple guide on using a robots.txt file:
http://www.webhostinghub.com/support/website/how-tos/using-robotstx...

Also, here is a link to Google's official webmaster tools guide on how to "Block or remove pages using a robots.txt file":
https://support.google.com/webmasters/answer/156449...

I hope this is helpful,
John-Paul