Piwik Web Analytics, and some unfortunate stats about my blog

Back in March when I selected a new template for this blog, I posted that I was looking into open source self-hosted web analytics tools to replace Google Analytics. There were a few reasons for this; most importantly, it started from a discussion with some privacy-conscious coworkers, who said that they use NoScript and specifically block Google from tracking them (which also breaks Google Analytics). This was a serious issue for me, as I no longer process server-side logs but relied solely on Google Analytics for traffic information. So, I decided to try something other than Google and ended up settling on Piwik as my solution. I will say, in full disclosure, that the amount of information Piwki gives is a bit scary; I can watch users navigate this blog in realtime, and even the initial dashboard page gives a list of the most recent visitors, with their IP address, country of origin, browser, OS, and the pages they visited. However my decision was made on two main points: first, that I wanted something withich could use server-side PHP to log visits (albeit with a lot less information) of people who had JavaScript or tracking disabled, and second, that if someone is going to have such amazingly detailed information on my visitors, it should be me, so I can ensure that I’m the only person who has access to it and that it isn’t used for the wrong purposes.

Aside: The only revenue I get from this site is through Google AdSense, which isn’t a whole lot given the low traffic (certainly not enough to pay for the hosting). Other than that, I keep this blog to try and share my knowledge with others, and hope that someone else can find the solution to their problem here instead of doing the work that I did. So, I find analytics very helpful; I check my stats now and then, go back and update or add to the most popular posts, and try to write relevant posts if it seems like a lot of people are finding their way here for something slightly different than the actual post they landed on. Unfortunately, that last point isn’t as easy since Google switched to HTTPS Search for logged-in users on October 18th, 2011 – I can no longer use Piwik see the search keywords that got Google users to my site. Luckily, these are still available through Google Webmaster Tools (via Traffic -> Search Queries on the left menu), though it adds an additional step and removes some of my motivation to check regularly and make sure people are getting useful content. Also, perhaps most importantly, it doesn’t let me associate search query with other stats like time on page, so even if one search query was very popular, I have no way of knowing whether all those people actually read the page, or took one look at it and left.

I really like Piwki. I don’t use most of it terribly often, but it gives me a nice overview visits graph on the WordPress dashboard (via the WP-Piwik plugin), infinitely detailed information (most of which I haven’t even looked at) in the Piwki web interface, and nightly email reports of visits to the site. It also supports multiple sites, so I have it on my ancient wiki, my Redmine instance, and even ViewVC. I’d highly recommend it; it’s full-featured (beyond anything I can even comprehend, really)

I was recently looking through the stats for this blog, and came by some unfortunate, though not surprising, trends. Below is the graph of visits per day, from April 1, 2012 through today (August 26, 2012):

blog visits chart

  1. It’s probably not terribly unusual for a site with as much technical content as mine (and mostly professional stuff, not just for hobbyists), my weekend traffic is usually a full 50% lower than weekday traffic. This can also be seen in the graph of visits by visitor’s local time, which is decidedly biased towards the 9am-5pm window:
    blog visits chart by visitor local time
    I guess there’s nothing I can really do about that, and it just gives me a nice maintenance window at 4am on Sunday mornings :)
  2. Looking at the overall graph, there also appears to be quite a bit of oscillation of the average visits over time. It’s nothing terribly large, but at a guess, I’d attribute it to my sporadic posting.
  3. Though it’s not visible in these graphs, this site has an 80% bounce rate (the percent of visitors that viewed only one page and then left the site). I guess that’s also not terribly unusual for a site with mostly how-to information on a wide variety of topics.
  4. To add a little more information to some of the previous items, here is the chart of my Feedburner RSS/Atom feed, since I started using Feedburner in February. The number of subscribers is in green, and the reach (number of people who actually clicked through to a post) is in blue:
    Feedburner stats
    This is a clear indication of something even stronger than the “bounce rate”; the apparently high number of people who subscribe to and then unsubscribe from my feed (if these stats are accurate). To me, this is an even stronger indication that what I really need to do is post useful content on a more regular basis – I have a tendency to blog in spurts, and either start a draft and never finish it, or write a few posts and set them to “pending” status with the intent of publishing them over a few days… and then forget the last part.

New Blog Theme

In a follow-up to my Some Thoughts on Choosing a New WordPress Theme post from a few days ago, I decided on the Admired theme by Brad Thomas. It’s amazingly full-featured and has a good set of options. I had to manually change a few things in the CSS (I wanted to tweak the top bar colors a bit in a way that’s not supported in the options), but overall it was a very simple transition. While it’s unfortunately very far from valid HTML or CSS, it seems quite nice.

If you happen to read this post and see anything wrong with the theme, or it doesn’t display properly for you, please leave a comment below (with browser version and OS, if you please).

My next project, continuing on from my Inaccuracies in Google Analytics for Website Stats post, is to compare the two self-hosted JavaScript-based open source Google Analytics alternatives I’ve identified (Piwik and Open Web Analytics) and try one out on my site (keeping in mind that my server is pretty heavily loaded, and I don’t want to push it over the edge). Once I come to some sort of conclusion on that, I’ll get back to some useful posts.

Some Thoughts on Choosing a New WordPress Theme

I think I’m going to choose a new theme for my blog. The current theme is iNove (albeit an older version with some custom modifications), and I feel like it looks a bit messy and has gotten a bit cluttered, so it’s time to find something new. I like the 2-column layout, and have a few other things I’m looking for – specifically, aside from something with advanced features like lots of widget support and hooks, something that has good visual separation between different posts and widgets. I also really want something, if possible, with relative column widths. My current home and work desktops both have dual monitors, and the minimum resolution I have on one screen is 1920×1080. When I look at my blog in a maximized window, about half the screen width is wasted with empty space. So, ideally, I’d like a theme that’s based on relative widths, probably with a “min-width” property so it wouldn’t get compressed to an absurdly narrow width on small screens.

I use Google Analytics (as noted in the privacy policy) for visitor statistics on this blog (more about that in a moment). So, I took a peek at the breakdown of visitors by screen resolution, and saw that for the past year, 94% of the 27,500 visits had a screen width of 1024px or more (and the majority of the others looked like mobile device resolutions, so they’d probably zoom the page correctly). So, my first gut reaction was to assume that I could use a theme approximately 1000px wide. Unfortunately, there’s two main problems with that: first, as mentioned by Chris Coyier on CSS-Tricks.com, just because someone has a given screen resolution doesn’t mean their browser window (let alone the viewport) is that size. As a matter of fact, I usually have my main browser window set at about 80% of the width of one of my monitors, with my instant messaging client Pidgin taking up the rest of the space. So there’s one inaccuracy. There’s a potentially much greater inaccuracy in my stats as well, which I’m going to discuss in a separate post.

New Project – Blog Migration to WordPress

For quite some time, I’ve been frustrated with Blogger. First of all, its’ publishing system is horribly inefficient. As everything is static HTML, at this point, writing this blog entry alone will require it to re-publish approximately 6 MB to my server. Seems sort’a pointless. Not to mention, it doesn’t allow any of the stuff that I really want, such as multiple categories with per-category RSS, or good searching. It also means that, though this blog is hosted on my own server, I’m dependent on Blogger to add posts.

I’m still horribly busy dealing with insurance companies and the police in relation to my stolen truck, as well as looking around and trying to figure out what my next vehicle will be, and how much I can spend on it.

Anyway, I’ve decided that at some point in the future, I’ll be migrating to WordPress for the blog. It will, of course, be hosted on my own machine, and will hopefully also include a migration of everything from this Blogger account. And, somehow, will include some sort of redirection from old posts to the relevant new ones. Most importantly, though, I plan on deferring the project until I get my multiple static IP service from Optimum Online, as the new blog (and the rest of my subdomains) will be moved from GoDaddy forwarding to their own subdomains setup as Apache name-based VHosts.

Stay tuned for progress updates…

Custom MediaWiki Sidebar; New Blog?

As you may have noticed, some Firefox 3 buttons have popped up not only here on my blog, but also on my wiki. While adding the buttons to Blogger was a simple addition to the template, getting them in the sidebar of MediaWiki wasn’t exactly as easy (yeah, I’m considering the arduous project of moving my whole 102+ page wiki to Drupal or another good F/OSS CMS).

After some serious grepping through the source, and adding HTML comments to see where they appeared, I finally found a solution to add the button to the MediaWiki sidebar – though I’d really like it to appear below the search box (I guess that’s something for my to-do list). I’m using the MonoBook skin (though somewhat modified). I’m using “MonoBook nouveau”, and it should be the version that shipped with MW 1.10.1. In this version, I added the code around line 166. Specifically, this was added before the <div id="p-search" class="portlet"> line, and after the end of the foreach ($this->data['sidebar'] as $bar => $cont) loop. This threw the button in a box directly above the search box, and below all of my sidebar links.

The code looked something like:

      <?php } ?>      <!-- firefox link added to MonoBook.php by jantman 2008-06-18 -->      <div class='portlet' id='p-logos'>          <h5>Cool Stuff</h5>          <div class='pBody'>              <ul>                  <li><a href="http://www.spreadfirefox.com/node&id=238326&t=305" target="_blank"><img border="0" alt="Firefox 3" title="Firefox 3" src="http://sfx-images.mozilla.org/affiliates/Buttons/firefox3/110x32_best-yet.png"/></a></li>              </ul>           </div>      </div>      <!-- end firefox link -->      <div id="p-search" class="portlet">

In other news, I’m taking a Data Driven Websites class this summer (PHP/MySQL, but for some reason they switched to a Windows server… endless problems, and I can’t even edit with Nano on the server, let alone emacs). Our first project was to build a blog engine, which I’m working on right now. Anyway, it got me thinking… the one thing that Blogger is missing is the ability to post to a given category, and allow users to view or subscribe to a specific category (or everything). So I think I may look into writing something like that myself, if I can’t find a good alternative that’s already done and is F/OSS. Regardless, I’ll probably be keeping the Blogger template as well as (ugh) moving over all of my current posts, which Blogger chose to store in raw HTML. So there’s going to be a lot of parsing on my future…

PS – When I get a new blog engine, I’m also going to go for a slightly modified template that uses relative widths and placement – so that code, like the snippet here, fits the screen correctly.

Website, Blog, Bacula

Website – In personal news, I’ve finished migrating all of the information content of JasonAntman.com to a wiki, based on MediaWiki. I’m still getting some kinks ironed out, and working on customization, but it seems to be coming along very well. It’s wonderfully easy to update information and to link between articles. Most of the content is more like notes than articles, but I’m trying to put most of my SysAdmin and programming notes up there, both for my own future reference and that of anyone who happens by the site. As always, though, some content will just live its’ life as a blog entry, so I encourage searching of my blog as well. This is my fourth instance of MeidaWiki, and while I haven’t set them up to play together, they all run wonderfully – and share a lot of common configuration (though I have separate instances of the code). Hopefully I’ll do a bunch of reorganization of the wiki sometime, and keep adding new content. Some of the newer pages include pages on DenyHosts and HPASM (from my blog post).

Blog – I know the template is awful. It’s on my list of things to do, and should be at the top of the queue in approximately 2056.

Bacula – Up to now, my backups have been a total kludge. The mere explanation of this elicits a feeling of nausea. A shell script on my backup storage server executes via cron. Each of the four important servers on my network (mail, web, monitoring, and development) have shell scripts that handle local backups – tar’ing up a list of directories, MySQL dumps, etc. – then tar gzip the whole thing and plop it in a local directory. The backup server executes these scripts and then copies the temporary files to its own disk via SCP. All of this is handled through an expect script, that runs each server consecutively. By morning, I end up with a 6+ hour job that’s finished, and dumped gigs of files on the backup server. Before finishing each machine, it deletes any backups on the backup server that are older than 10 days. After copying everything, it deletes the client’s local copy. The bottom line is that if a machine goes down, I can re-install the OS and all packages, and then have the backups of just /etc and user data. Not beautiful. Even worse, my backup storage server doesn’t have a tape drive. When I get around to it, I run a script on my development/storage box that copies the latest backups from each machine, located on the backup server, to a tempdir and then writes them to tape. To top it all off, I have only one network, so all of these gigs of data are crawling across my ancient 10/100 switch, along with all other connectivity to the outside world.

Unfortunately, it doesn’t look like I’ll have the money to upgrade to Gig-E any time soon, even just for the 5 machines involved. More to the point, there’s no way that I’ll have the money to buy a manageable Gig-E switch that can come anywhere close to my BayStack 450-24T. So, it’s time to invest… well… time… in a good backup infrastructure. After doing a lot of research, I came to two findings:

  1. The two main options seem to be AMANDA and Bacula.
  2. I don’t like how AMANDA works.

So, I’m going to give Bacula a shot. I did consult the SAGE mailing list for advice, and got some recommendations for BackupPC, but Bacula seems to be more my type of thing. Well, I did an install, and spent about 8 hours hacking around with the config files. No luck. Bacula is designed to be highly modular and scalable, but to be honest, I find the config files to be *very* complicated. Furthermore, I wasn’t able to find any good example configurations with documentation. After brainstorming for a while (laying in bed watching Law & Order and reading the Bacula docs on dead trees) I decided to give in – despite my continued efforts to stop using it, I checked Webmin and, surely enough, they have a Bacula module. After starting with fresh config files, I was able to get Bacula up and running on my development/storage server (a fresh install of openSuSE 10.1) as the director. I got a file daemon installed on the web server. Everything looked wonderful.

The current status: My backup storage server does only that – storage of backups. Nothing else. It’s still running SuSE 9.3. The Bacula RPMs for 9.3 are from the 1.x tree, and all of my other machines are running openSuSE 10.x, with Bacula 2.x. I gave it a shot but, sure enough, a Bacula 2.x director won’t jive with a 1.x storage daemon. And I’m in dependency hell – Bacula 2.x requires upgrades of everything from the C libs all the way up. So, I’m going to give a shot at an upgrade of the storage machine via YaST, and see where I get.