Really off numbers for PageStatistics table
Permalink
I've been trying to do some analytics testing on a site that seemed like it was having really low numbers in google analytics.
What I was seeing is 14 visitors for a particular date, with 19 page loads in google analytics. Yet for the same day, the "PageStatistics" table had 1864 entries.
So I installed Piwik on the site, and also the stats package from Tony. Those both seemed to report my personal visits properly, which also looked like the results from Google analytics.
The weird thing is, I can log in to phpMyAdmin and look at the table and see more entries coming in. Like every 2-3 minutes. But I don't get what can be triggering those entries if nobody is actually on the site.
Has anyone else noticed anything like this? I'm not really sure what would cause it, but I want to make sure that the reporting is working properly. Would something like an unclosed div cause it? Bots hitting pages and leaving before the page loads? It's a pretty quick site (after a lot of work) so I don't think most real visitors would be in and out that fast.
I know that the PageStatistics table does not represent the most accurate info, but a discrepancy of 100x seems really huge. And the fact that I can see things incrementing in the database without any reporting in other places is weird, too...
What I was seeing is 14 visitors for a particular date, with 19 page loads in google analytics. Yet for the same day, the "PageStatistics" table had 1864 entries.
So I installed Piwik on the site, and also the stats package from Tony. Those both seemed to report my personal visits properly, which also looked like the results from Google analytics.
The weird thing is, I can log in to phpMyAdmin and look at the table and see more entries coming in. Like every 2-3 minutes. But I don't get what can be triggering those entries if nobody is actually on the site.
Has anyone else noticed anything like this? I'm not really sure what would cause it, but I want to make sure that the reporting is working properly. Would something like an unclosed div cause it? Bots hitting pages and leaving before the page loads? It's a pretty quick site (after a lot of work) so I don't think most real visitors would be in and out that fast.
I know that the PageStatistics table does not represent the most accurate info, but a discrepancy of 100x seems really huge. And the fact that I can see things incrementing in the database without any reporting in other places is weird, too...
I don't have that addon installed on this site, so I don't think that's it. The statistics table only gives you cID, date, and timestamp for info, so it's hard to tell where it's coming from.
I didnt figure you were using the add on, the idea was if that add on happens to display based on the core statistics, there shows an example of how the numbers could be off by x10! Google and such keep eye on ip's whereas c5 just keeps track of the hits...
But the pages in the log are not the same as the ones on the analytics at all...
I have been adding some symbols to Magic Data that do page stats. The main thing I found that could return erroneous xN results was doing a badly constructed join between page stats and another table with page information (such as versions), which could result in N1 x N2 results. So maybe have a look for joins that are not fully thought through in the SQL.
There is no sql going on here, at least none that I'm writing. I'm looking directly on the database table in phpMyAdmin. All three reporting tools show that nobody else is on the site, yet I'm getting new records in the PageStatistics table...
So its data collection, not analysis that is doing funny stuff.
Looking at some of my sites, the number of visitor entries in Page Statistics is not unreasonable and fits in with what I would expect for casual traffic.
In the table, it would be useful to have the IP recorded, especially when the uID is 0. Maybe hacking the Page Stats table and the Page Stats writer to add IP addresses is the way to get to the bottom of it.
Do you have a cron job running ticks for the 5.6.2 queuable jobs?
Looking at some of my sites, the number of visitor entries in Page Statistics is not unreasonable and fits in with what I would expect for casual traffic.
In the table, it would be useful to have the IP recorded, especially when the uID is 0. Maybe hacking the Page Stats table and the Page Stats writer to add IP addresses is the way to get to the bottom of it.
Do you have a cron job running ticks for the 5.6.2 queuable jobs?
There are no cron jobs on the site. It's running 5.5.2.1 I believe, so no queable jobs at all.
I have Tony's stats package installed which does have the IP addresses logged, and that one only has the low numbers. So whatever is hitting the pages isn't getting recorded there, and that's directly in c5, too.
I checked the apache access log as well. It's definitely showing the hits that are happening in the PageStatistics table. I know at least one of them (just happened) was google bot hitting a page that didn't exist, that actually showed up as cID 1.
There were another couple that showed up in the access log that were spambots trying to post to a forum that doesn't exist. I didn't catch what that showed up as.
I guess it's possible that most of these hits are from search engines? I guess they must not be ranking it very high since there's not a lot of search engine traffic.
It does show that I have a couple more things to work on redirecting from the original wordpress site. Things like /tag/something show up as 404, I should make a single page that redirects to the search with the correct akID url. There's also /feed/ that a lot of hits seem to come up for, that should redirect to the rss url for the page list on the home page. The feed reader hits didn't seem to show up in the PageStatistics table.
I have Tony's stats package installed which does have the IP addresses logged, and that one only has the low numbers. So whatever is hitting the pages isn't getting recorded there, and that's directly in c5, too.
I checked the apache access log as well. It's definitely showing the hits that are happening in the PageStatistics table. I know at least one of them (just happened) was google bot hitting a page that didn't exist, that actually showed up as cID 1.
There were another couple that showed up in the access log that were spambots trying to post to a forum that doesn't exist. I didn't catch what that showed up as.
I guess it's possible that most of these hits are from search engines? I guess they must not be ranking it very high since there's not a lot of search engine traffic.
It does show that I have a couple more things to work on redirecting from the original wordpress site. Things like /tag/something show up as 404, I should make a single page that redirects to the search with the correct akID url. There's also /feed/ that a lot of hits seem to come up for, that should redirect to the rss url for the page list on the home page. The feed reader hits didn't seem to show up in the PageStatistics table.
I noticed, that between hitting pages with the cid info urls, pretty url and non pretty url pages, etc, I would end up with the whos online add-on showing that there were sometimes up to 8 people online, knowing that i was the only person viewing the page. So somewhere along the way the visits table HAD to be growing tremendously from all of these "ghost" hits...