On the lighter side, I found a few web sites by Tom Blackwell that do some fun stuff with text overlays on images. seems like a nice little tool for those end-of-project powerpoints, or to send out the monthly “most rolled-back commits” medal…




On the lighter side, I found a few web sites by Tom Blackwell that do some fun stuff with text overlays on images. seems like a nice little tool for those end-of-project powerpoints, or to send out the monthly “most rolled-back commits” medal…




I saw a link to this YouTube video shared on Tom Limoncelli’s blog. It’s a 1953 US Navy instructional video about an all-mechanical fire control computer. Yes, I really mean a computer that can solve continuously changing 25-variable fire control problems using only mechanical means (gears, cams, etc.). Think about it for a minute – it’s truly mind-boggling. And really gives one an amazing appreciation for the power of a simple pocket calculator, and the amazing engineering that went into solving these problems before electronic computers. I’m usually not much of a math geek, but I watched the whole 40 minute video and was in awe of both the simple ability to use three arms and a pin to multiply numbers, and the amazingly precise engineering and machining it would take to translate various rotation inputs into landing a shell on a moving ship miles away. It’s a really good watch, and will probably leave you astonished by both how far technology has come (and what we take for granted every day), and by the fact that feats of engineering like this one worked quite well.
Back in March when I selected a new template for this blog, I posted that I was looking into open source self-hosted web analytics tools to replace Google Analytics. There were a few reasons for this; most importantly, it started from a discussion with some privacy-conscious coworkers, who said that they use NoScript and specifically block Google from tracking them (which also breaks Google Analytics). This was a serious issue for me, as I no longer process server-side logs but relied solely on Google Analytics for traffic information. So, I decided to try something other than Google and ended up settling on Piwik as my solution. I will say, in full disclosure, that the amount of information Piwki gives is a bit scary; I can watch users navigate this blog in realtime, and even the initial dashboard page gives a list of the most recent visitors, with their IP address, country of origin, browser, OS, and the pages they visited. However my decision was made on two main points: first, that I wanted something withich could use server-side PHP to log visits (albeit with a lot less information) of people who had JavaScript or tracking disabled, and second, that if someone is going to have such amazingly detailed information on my visitors, it should be me, so I can ensure that I’m the only person who has access to it and that it isn’t used for the wrong purposes.
Aside: The only revenue I get from this site is through Google AdSense, which isn’t a whole lot given the low traffic (certainly not enough to pay for the hosting). Other than that, I keep this blog to try and share my knowledge with others, and hope that someone else can find the solution to their problem here instead of doing the work that I did. So, I find analytics very helpful; I check my stats now and then, go back and update or add to the most popular posts, and try to write relevant posts if it seems like a lot of people are finding their way here for something slightly different than the actual post they landed on. Unfortunately, that last point isn’t as easy since Google switched to HTTPS Search for logged-in users on October 18th, 2011 – I can no longer use Piwik see the search keywords that got Google users to my site. Luckily, these are still available through Google Webmaster Tools (via Traffic -> Search Queries on the left menu), though it adds an additional step and removes some of my motivation to check regularly and make sure people are getting useful content. Also, perhaps most importantly, it doesn’t let me associate search query with other stats like time on page, so even if one search query was very popular, I have no way of knowing whether all those people actually read the page, or took one look at it and left.
I really like Piwki. I don’t use most of it terribly often, but it gives me a nice overview visits graph on the WordPress dashboard (via the WP-Piwik plugin), infinitely detailed information (most of which I haven’t even looked at) in the Piwki web interface, and nightly email reports of visits to the site. It also supports multiple sites, so I have it on my ancient wiki, my Redmine instance, and even ViewVC. I’d highly recommend it; it’s full-featured (beyond anything I can even comprehend, really)
I was recently looking through the stats for this blog, and came by some unfortunate, though not surprising, trends. Below is the graph of visits per day, from April 1, 2012 through today (August 26, 2012):



As a follow-up to my CVS to SVN to Git post, I have the PHP EMS Tools repository migrated from my SVN to github. Since I’m moving the website and all development to a Redmine instance, the next step is setting up Github to work as a revision control repository in redmine. Well, it’s dead simple. I just followed the instructions for the Redmine plugin: Github hook , with the exception that I followed the redmine instructions for setting up the repository clone instead of Step #2 in the plugin instructions. All worked well, though I’ll admit I only tried it talking to redmine over plain HTTP, not HTTPS.
In a follow-up to my Some Thoughts on Choosing a New WordPress Theme post from a few days ago, I decided on the Admired theme by Brad Thomas. It’s amazingly full-featured and has a good set of options. I had to manually change a few things in the CSS (I wanted to tweak the top bar colors a bit in a way that’s not supported in the options), but overall it was a very simple transition. While it’s unfortunately very far from valid HTML or CSS, it seems quite nice.
If you happen to read this post and see anything wrong with the theme, or it doesn’t display properly for you, please leave a comment below (with browser version and OS, if you please).
My next project, continuing on from my Inaccuracies in Google Analytics for Website Stats post, is to compare the two self-hosted JavaScript-based open source Google Analytics alternatives I’ve identified (Piwik and Open Web Analytics) and try one out on my site (keeping in mind that my server is pretty heavily loaded, and I don’t want to push it over the edge). Once I come to some sort of conclusion on that, I’ll get back to some useful posts.
I think I’m going to choose a new theme for my blog. The current theme is iNove (albeit an older version with some custom modifications), and I feel like it looks a bit messy and has gotten a bit cluttered, so it’s time to find something new. I like the 2-column layout, and have a few other things I’m looking for – specifically, aside from something with advanced features like lots of widget support and hooks, something that has good visual separation between different posts and widgets. I also really want something, if possible, with relative column widths. My current home and work desktops both have dual monitors, and the minimum resolution I have on one screen is 1920×1080. When I look at my blog in a maximized window, about half the screen width is wasted with empty space. So, ideally, I’d like a theme that’s based on relative widths, probably with a “min-width” property so it wouldn’t get compressed to an absurdly narrow width on small screens.
I use Google Analytics (as noted in the privacy policy) for visitor statistics on this blog (more about that in a moment). So, I took a peek at the breakdown of visitors by screen resolution, and saw that for the past year, 94% of the 27,500 visits had a screen width of 1024px or more (and the majority of the others looked like mobile device resolutions, so they’d probably zoom the page correctly). So, my first gut reaction was to assume that I could use a theme approximately 1000px wide. Unfortunately, there’s two main problems with that: first, as mentioned by Chris Coyier on CSS-Tricks.com, just because someone has a given screen resolution doesn’t mean their browser window (let alone the viewport) is that size. As a matter of fact, I usually have my main browser window set at about 80% of the width of one of my monitors, with my instant messaging client Pidgin taking up the rest of the space. So there’s one inaccuracy. There’s a potentially much greater inaccuracy in my stats as well, which I’m going to discuss in a separate post.
I use Google Analytics for visitor stats on this blog. Not because I’m trying to direct-market to my readers or become Big Brother, but for a number of simple reasons:
Obviously not for google, but for me, all of these stats are totally anonymous – I just get percentages or numbers of visits, it’s not like I can see all of the details per-IP address. The most important aspect to me is just the ease of use – I sign up and put a little snippet of code on my pages, and I get an amazing dashboard interface with all of this information. Nothing to install and update on my server, and (most importantly, since I’m now running everything of mine on one virtualized server) no massive program to run as a cron job that has to read all my server log files.
Last week I was talking with a couple of my co-workers, specifically about the stats that I get from Google Analytics. While I know it’s not uncommon to run NoScript especially among the more security- and privacy-conscious groups of people, I was a bit disturbed to hear that they all block Google’s tracking code in their browsers via NoScript. I assume there’s also a percentage of people who still just turn off JavaScript alltogether (although I can’t imagine how they use the modern Web), and many who use the Google Analytics Opt-Out feature. So, especially with as technical an audience as I have, I guess that means I’m likely missing a large number of visitors in my stats. On one hand, I want to respect the privacy of my visitors, and respect their desire to opt-out of advanced tracking. On the other hand, since I no longer parse web server logs for statistics, these privacy-conscious visitors aren’t even showing up in what I think of as my monthly visit count, or in my information on what posts and search keywords are most popular, which I only use for “good” purposes – to make my blog more useful. So that’s a bit of a conundrum.
I’ll admit that I do run Google AdSense Ads on my blog, and I’m sure there are some people who block the ads. On one hand, that upsets me a bit; I run this blog to try and share information that I find or learn with others, and the hosting costs aren’t insignificant. If I can get paid to just show some ads, to try and help offset the cost of running the site, I think that’s good. And if other people can help support the site by just letting the ads stay on the page, why not? On the other hand, my hosting costs $50/month (granted the server also handles all of my email, and a whole bunch of other sites). I’ve been participating in Google AdSense since March 5, 2010 (two years and two weeks), and my “estimated earnings” are currently $80. The payout is in $100 increments. So, I haven’t seen a cent from it in two years, so I’ve given up being concerned with it. If you want to be nice, and find my posts interesting, click on one of the ads. Unfortunately, unless I get famous, the ads aren’t going to come close to offsetting even part of the cost of running the site.
Google itself says, “In order for Google Analytics to record a visit, the visitor must have JavaScript, images, and cookies enabled for your website.” There seems to be some buzz about this on the ‘net, and I’ve seen a number of (translated; original post in Dutch) posts advocating building a request to a Google Analytics GIF manually on the server side, and then including it as an image element inside a <noscript> tag in the page. While this is probably one of the nicer solutions (and much more likely to reduce double-counting), it doesn’t capture any of the advanced data (screen resolution, etc.) that JavaScript-based Analytics does, and more importantly, I imagine that many of the Ad blocking extensions also block traffic to google-analytics.com, so this is an incremental improvement at best. There are also quite a few posts about how to make a request to Analytics purely server-side. This has a few disadvantages as well; it bypasses the google domain blacklisting problem that the client-side image has, but it also means you lose the client IP address (and therefore geolocation), and that you double-track any user who allows javascript (so you need a separate profile, and then need to average out the results). It also means that you track every search engine and bot that crawls your site, and possibly every person who clicks a link and then hits “back” before the page finishes loading. I found another blogger who commented about the wide disparity he saw between Google Analytics, the StatsPress WordPress plugin, and AWstats (a server-side log file analyzer).
So what’s the solution?
Most of the options have a down side, but I’m looking for something that’s the best I can reasonably do. As much as I’d rather not, I’m going to look into self-hosted alternatives to Google Analytics (a self-hosted JavaScript-based stats provider), in the hopes that NoScript users will be more friendly to scripts coming from my own domain, and sending requests to my own domain, than ones from Google or other major trackers. I don’t think I want to try anything that parses web server logs as a primary approach, as I don’t think I could ever get a meaningful comparison to Google Analytics or something else JavaScript-based.
I did have one other idea which I think is interesting, though a bit of an overhead. I could have Apache (or, more likely, a Perl script called in the Apache configuration) generate a random string for each request, and save it in an Apache environment variable. The environment variable would then be added to a field in the server logs, and also added (via PHP or whatever else generates the pages server-side) as a custom parameter for the JS tracking code, enabling page hits to be correlated between the JS tracking and the server logs. Assuming the JS tacking backend stores its data in a sane format (and as raw data, not just aggregated), and at the cost of a serious performance penalty, a server-side statistics program like AWstats or Webalizer could be patched to lookup the unique identifier in the JS stats data store, and ignore all hits which were tracked that way.
I’m going to start by looking into self-hosted open source alternatives to Google Analytics, which I’ll post about sometime hopefully soon.
Two interesting presentations – which, unfortunately, I only heard the audio for (they came up in my podcast playlist during my commute today).
For the non-techie-geeks out there, All Your Brains Suck – Known Bugs And Exploits In Wetware: OSCON 2011 – O’Reilly Conferences, July 25 – 29, 2011, Portland, OR is an interesting talk from Paul Fenwick. Yes, it’s from a techie conference (wetware == the brain), but it’s mostly psychology-based, and covers some REALLY interesting things about how the human brain works, especially with an emphasis on how the brain is manipulated in advertising and otherwise.
For the techie-geeks among us (well, given the other link, “geeks” would be too vague), Velocity 2011: Theo Schlossnagle, “Career Development”. It’s a bit of a rant, but a good talk for us infrastructure/ops people, and developers as well, and covers some thoughts on both careers and how we should do out jobs.
While working on a particularly long documentation page in MediaWiki today, I came by a convenient little extension:
It adds a nice little “top” link next to the “edit” link in each section header. Very useful for long pages.
If you’re anything like me, you often find yourself working on multiple computers. Today I left a few tabs open in Firefox on my work laptop, and wanted to continue reading from my desktop. Normally I’d just grab the laptop, or RDP into it if it was my work desktop that had the open tabs, but at the moment my girlfriend is neck-deep in WoW on the MacBook. Having had this problem before (getting tabs back remotely, not a laptop occupied with WoW), I started thinking about a solution.
I could have closed my local firefox session, moved the sessionstore.js somewhere else, copied the one from the laptop over, re-opened firefox, … well, you get the idea.
But that sounds like a really sub-optimal solution. So I started looking around a bit. It seems that sessionstore.js is almost JSON, but as per Mozilla bug 407110, it’s not quite standards-compliant. Luckily, it seems that PHP’s JSON module is quite tolerant, so once I stripped off the leading and trailing parens from the file contents, it parsed quite nicely.
I’ve written a small dumpFirefoxSession.php script that reads the sessionstore.js file (in cwd or a specified path), unserializes the JSON as an array, and then dumps the tabs. It dumps as either plain text or HTML (currently just elements inside the body, not a full HTML file). The HTML will include ols for each window listing the tabs, links to the current content (sessionstore.js also holds history for each tab, but I don’t need this), and it shows which tab is currently selected.
You can grab the script from subversion at: http://svn.jasonantman.com/misc-scripts/dumpFirefoxSession.php. The current version is 3. You’ll need PHP (probably 5) with JSON support.