Sitemap being generated including pages with not yet met publish dates. Issue?
Permalink
Quick question for the community:
I work on a site that uses the sitemap to index the website into its search function. Recently, we discovered a page that was set to publish in a week (publish_date set to accomplish this) was showing up in our sitemap files, and thus was being indexed and was searchable. Other areas of "hiding" seemed fine (aka item not in page navigation).
I took a quick look at the generate_sitemap.php file (<home>/concrete/jobs/generate_sitemap.php). Here was the pertinent code section:
As you can see, nothing in this code snippet checks for publication date.
I know a workaround is to add the Exclude from Sitemap attribute to the page, but I want the code to work so that people don't have to remember to remove that attribute on publication day.
Working on a patch for this now. Is this something that we would want to add to the baseline, or is this working as intended?
I work on a site that uses the sitemap to index the website into its search function. Recently, we discovered a page that was set to publish in a week (publish_date set to accomplish this) was showing up in our sitemap files, and thus was being indexed and was searchable. Other areas of "hiding" seemed fine (aka item not in page navigation).
I took a quick look at the generate_sitemap.php file (<home>/concrete/jobs/generate_sitemap.php). Here was the pertinent code section:
$c = Page::getByID($row['cID'], 'ACTIVE'); $g->setPermissionsForObject($c); if ($c->isSystemPage()) { continue; } if($c->getAttribute("exclude_sitemapxml")) { continue; } if($c->isExternalLink()) { continue; } if ($g->canRead()) { *****CREATE SITEMAP ENTRY****
As you can see, nothing in this code snippet checks for publication date.
I know a workaround is to add the Exclude from Sitemap attribute to the page, but I want the code to work so that people don't have to remember to remove that attribute on publication day.
Working on a patch for this now. Is this something that we would want to add to the baseline, or is this working as intended?
Doki,
I'd be interested in this as a patch... curious to know if you succeeded in getting it to work. Just today I ran into this as an issue: because "future" blog entry pages are getting added to the sitemap, they're getting crawled and indexed by Google. I'd like to stop that from happening... and was just about to dive into the code when I found this thread.
If you've succeeded... care to share? ;D
Thanks!
- John
I'd be interested in this as a patch... curious to know if you succeeded in getting it to work. Just today I ran into this as an issue: because "future" blog entry pages are getting added to the sitemap, they're getting crawled and indexed by Google. I'd like to stop that from happening... and was just about to dive into the code when I found this thread.
If you've succeeded... care to share? ;D
Thanks!
- John
http://github.com/concrete5