CVS to SVN to Git

Thanks to some new interest, I’ve decided to resurrect an old project of mine, PHP EMS Tools. It’s a web-based tool for small emergency services organizations, mainly aimed at volunteer EMS/ambulance providers. The tool handles roster tracking, scheduling, equipment maintenance and checks, and a bunch of other administrative tasks. I first started it in 2007 for the Midland Park Ambulance Corps (MPAC), which I was a volunteer EMT with from 2005 through 2011. I’ll admit that it’s a perfect model of how not to run a software project. The first few releases are plain awful code. I was keeping the project in CVS at the time, and posted some early releases on sourceforge and FreshMeat, now FreeCode. Sometime in 2009, I migrated the contents of the trunk of the CVS module to a SVN repository, but discarded the history. I also setup a MediaWiki-based website for the project, giving some information and mainly asking for feedback. Around that time I started working on a new and heavily updated (fixed) version for MPAC, but since it appeared that there was no interest in the project, and there were many many local customizations and organization-specific features, I let their codebase diverge from what was released, and as a result, stopped keeping it in version control. Until now, when they need to migrate to a new server, and I’ve also gotten some outside interest in the project.

So, as of this morning, I was left with at least four code bases:

  1. the original CVS repository with branches and tags and some history, untouched since 2007
  2. the SVN repository circa 2009, with only 3 commits, all related to the migration from CVS to SVN
  3. a “release” tarball that at least one outside organization is actually using.
  4. the code that MPAC is running, which has been largely rewritten since 2009, but also contains a lot of organization-specific customizations.

As a first step, I created a new SVN repository and migrated the original CVS repo, complete with history, branches, and tags, to it using cvs2svn, and then removed write permissions on the actual module in the repository. This gave me a SVN repository with all of the history of previous so-called releases, with a trunk matching r1 of the “current” SVN repository. I then manually applied patches to trunk/ for the two commits in the current SVN repository, and set the svn:date revision property to the correct 2009 date for those commits. I also confirmed that the correct tag matches up to the “release” tarball mentioned above. So, I’m down to a “current” trunk, plus the locally modified code running on MPAC’s current server. My plan of action from this point is as follows:

  1. Move the PHP EMS Tools website from Mediawiki to my local redmine installation, and update the news with a link to this post.
  2. Migrate the SVN repository, which now contains full history, to Git hosted at Github. Add Github integration to Redmine.
  3. Update freshmeat, sourceforge, and anywhere else online that knows about the project.
  4. Working in a git branch, begin converging the code MPAC is currently running with the latest (now git) trunk, trying to provide configuration options for anything organization specific, and testing as I go.

If all works well, I’ll end up with MPAC running the current trunk, just some different configuration options, and a working, up-to-date release. The biggest issues are going to be how I handle the MPAC-specific additions and customizations (a lot of stuff hard-coded for our position titles, plus our very custom call report and telephone-based call-in software, which is pretty tightly linked with the PHP EMS Tools core), and how I balance abstracting things to be configurable for other users versus getting this all done in a reasonable amount of time.

Stay tuned…

Adding Piwik Web Analytics Integration to ViewVC

All of my public subversion repositories and CVS repositories are available online through a great Python application called ViewVC, which provides a web-based interface to CVS and SVN repositories, as well as history browsing, graphical diffs, etc. An amazingly large amount of the traffic to my web server is for the vhosts that serve this, so I decided that I should add some analytics to it. I’m in the process of trying out Piwik, a full-featured, GPL-licensed, self-hosted alternative to Google Analytics. It gives lots of useful information like number of visits and unique visits per page, search engine keywords, referrers, average time on page, bounce rate (number of one-page visits), etc.

I have ViewVC installed from the RPMforge packages, so there’s one code base for both of my vhosts. This means that I can’t simply slap the tracking code at the bottom of the templates and call it a day. I opted to go for a nicer solution, and what follows is a patch (diff -u) to the current (1.1.13) version of ViewVC that adds a “piwik” section to viewvc.conf, and adds the piwik tracking code with the specified base URL and site ID into all ViewVC pages. Enjoy.

diff -ru viewvc-ORIG/lib/config.py viewvc/lib/config.py
--- viewvc-ORIG/lib/config.py	2012-01-25 08:31:52.000000000 -0500
+++ viewvc/lib/config.py	2012-03-23 21:57:08.000000000 -0400
@@ -108,6 +108,7 @@
     'query',
     'templates',
     'utilities',
+    'piwik',
     )
   _force_multi_value = (
     # Configuration values with multiple, comma-separated values.
@@ -127,6 +128,7 @@
                'options',
                'templates',
                'utilities',
+               'piwik',
                ),
     'root'  : ('authz-*',
                'options',
@@ -461,7 +463,14 @@
     self.cvsdb.check_database_for_root = 0
 
     self.query.viewvc_base_url = None
-    
+
+    # begin <jason@jasonantman.com> patch for piwik integration
+    self.piwik.use_piwik = 0
+    self.piwik.base_url = ''
+    self.piwik.site_id = ''
+    self.piwik.use_jsindex = 0
+    # end <jason@jasonantman.com> patch for piwik integration
+   
 def _startswith(somestr, substr):
   return somestr[:len(substr)] == substr
 
diff -ru viewvc-ORIG/templates/include/footer.ezt viewvc/templates/include/footer.ezt
--- viewvc-ORIG/templates/include/footer.ezt	2012-01-25 08:31:52.000000000 -0500
+++ viewvc/templates/include/footer.ezt	2012-03-23 22:03:04.000000000 -0400
@@ -13,5 +13,17 @@
 </tr>
 </table>
 
+[is cfg.piwik.use_piwik "1"]
+<script type="text/javascript">
+var pkBaseURL = (("https:" == document.location.protocol) ? "https://[cfg.piwik.base_url]/" : "http://[cfg.piwik.base_url]/");
+document.write(unescape("%3Cscript src='" + pkBaseURL + "[is cfg.piwik.use_jsindex "1"]js/[else]piwik.js[end]' type='text/javascript'%3E%3C/script%3E"));
+</script><script type="text/javascript">
+try {
+var piwikTracker = Piwik.getTracker(pkBaseURL + "piwik.php", [cfg.piwik.site_id]);
+piwikTracker.trackPageView();
+piwikTracker.enableLinkTracking();
+} catch( err ) {}
+</script><noscript><p><img src="http://[cfg.piwik.base_url]/piwik.php?idsite=[cfg.piwik.site_id]" style="border:0" alt="" /></p></noscript>
+[else][end]
 </body>
 </html>
Only in viewvc-ORIG/templates/include: header.ezt~
diff -ru viewvc-ORIG/viewvc.conf.dist viewvc/viewvc.conf.dist
--- viewvc-ORIG/viewvc.conf.dist	2012-01-25 08:31:52.000000000 -0500
+++ viewvc/viewvc.conf.dist	2012-03-23 21:44:02.000000000 -0400
@@ -1131,3 +1131,29 @@
 #viewvc_base_url =
 
 ##---------------------------------------------------------------------------
+[piwik]
+
+## This section enables Piwik <http://piwik.org> web analytics tracking.
+## If piwik is enabled (use_piwik = 1) all other options must be specified.
+##
+## This is based on a patch by Jason Antman <jason@jasonantman.com> <http://www.jasonantman.com>
+## to ViewVC 1.1.13, written 2012-03-23.
+## The latest version of the patch, and information on it, can always be found at:
+## <http://blog.jasonantman.com/2012/03/adding-piwik-web-analytics-integration-to-viewvc/>
+##
+##
+## To enable piwik, change use_piwik to 1. Set to 0 to disable
+use_piwik = 1
+##
+## Set base_url to the hostname and path to your piwik installation, with no trailing slash.
+## i.e. piwik.example.com or www.example.com/piwik
+base_url = piwik.example.com
+##
+## Set to the numeric id of your website in Piwik
+site_id = 5
+##
+## Set to 1 if you want to use js/index.php to serve the tracking code, 
+## or leave at 0 if you want to call piwik.js directly
+use_jsindex = 0
+
+##---------------------------------------------------------------------------
\ No newline at end of file

Meld – Graphical Diff Tool for SVN Directories

I’ve been in the process of manually merging two directories in a subversion repo. The second started out as a “development” copy of the first (without branching, unfortunately). Since there’s quite a few files, I decided that a graphical diff program is a must. I usually use kdiff3, but my requirements for this are a bit more stringent than usual: it has to handle recursive diffs on two directories, and it has to be able to ignore SVN keywords (or an arbitrary regex) since all of the files have keyword substitution on LastChangedRevision and HeadURL. Kdiff3 supports preprocessor commands which can include filtering the text through sed before performing the diff (so I modified their regex to ignore version control keywords), but for some reason (perhaps either bimary differences, or metadata differences) I couldn’t get the file difference indicator in the diretory tree view to reflect this; even when ignoring keyword lines and whitespace, it still showed every pair of files as different.

Enter Meld, a graphical diff project. I’ve only used it for half an hour or so, but it seems wonderful. It’s easy to use, has a pleasing interface similar to Kompare, and even has simple check boxes in the options menu to ignore whitespace and SVN keywords – and they work! So far, I’m about half way through my 300+ file tree, and the merge is going wonderfully.

Client-side subversion commit message hooks

While I know this isn’t best practice, since we use LDAP-based auth for our Linux boxes (including a sudoers file based on LDAP group membership), we usually do work on some boxes as root (sudo su -). This includes our puppetmaster, where configs are kept in subversion and edited as root. The one problem with this is how to get the username of the actual committer, not root, in subversion messages.

The theory that I came up with is a shell script that finds out who the actual user is, and then tacking this onto the beginning of the subversion commit message (since there’s no real way to do client-side hooks in subversion). While I struggled with subversion’s lack of good client hooks, I came up with a theory based on a script that preloads svn-commit.tmp and then calls the text editor. It’s actually quite simple.

First, in your .bashrc or wherever you setup environment variables, export SVN_EDITOR=/usr/local/bin/svnPreCommitClientHook.sh. This way, every time you run svn commit, instead of calling your text editor with svn-commit.tmp as an argument, the bash script will do what it needs to (commit message preloading) with svn-commit.tmp and then call your editor to finish the message.

/usr/local/bin/svnPreCommitClientHook.sh:

#!/bin/bash
LOGNAME=`/usr/local/bin/getLogname.py` # script to get user's actual login name, even if using sudo su
echo -e "\nBY: $LOGNAME" > svn-commit.foo
cat svn-commit.tmp >> svn-commit.foo
mv svn-commit.foo svn-commit.tmp
"$EDITOR" svn-commit.tmp

Using this method, running svn commit will pull up your text editor with “BY: username” already inserted in the commit message.

The Newest Generation of Hackers

Note for non-technical readers (not that I expect there to be many). The title of this post includes the word “hacker”. If you think that has anything to do with illegal acts or unethical behavior, you’ve fallen victim to what happens when the mainstream media latches on to a term they don’t understand. The definition of this word is far from negative. Within the geek community, the title “hacker” is the utmost compliment – something like Grand Master in the martial arts, or perhaps whatever title is given to an eminent artist. It both describes someone who is an expert in their field. Or, more generally, someone who enjoys seeking knowledge simply for the sake of knowledge – figuring out how things work, how to make them, and how to make them better. If you’re looking for a term that describes a criminal, “attacker”, “malicious user” or “computer criminal” work fine. While I wouldn’t by any extent consider myself a hacker in the super-genius-wizard sense of the term, I do definitely subscribe to the hacker ethic – the burning need to figure out how things work and make them better.

Thanks to the snow at the end of last week, and a long weekend, I actually got to do some reading that didn’t involve man pages or books strictly about software. I finally finished The Daemon, the Gnu, and the Penguin by Peter H. Salus, a wonderful book (with a great foreword by maddog Hall). I also finally got a chance to start reading The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary by Eric S. Raymond (ESR). I’m only up to page 50 or so, but it’s an equally good book, and I’ve been looking forward to reading it for years.

I’ve always been very interested in history (heck, I have a minor in it), and specifically the history of my other interests. When photography took up most of my time, I read every photo history book I could get my hands on (including many primary sources on now-archaic techniques). In the past few years, I’ve been amassing books on computing history (specifically ARPANET/the Internet and Unix/Linux/Free software) at a near-alarming rate.

Through all of my reading, two main things have struck me: the utterly amazing feats accomplished by previous generations, and how my own generation takes them for granted. I was born in 1987 which, I feel, makes me part of a very small group who were lucky enough to grow up during the real rise of the Internet. I remember playing simple games on my grandmother’s (business) 386DX long before I could read most of the words on the screen. But I also remember my father dialing in to an ISP (I honestly don’t remember which one) on a 9600 baud serial modem, and how unique that was at the time (at least among kids my age). By 13 or so, I had a 10BaseT network in my house, sharing a 56k dial-up connection between two computers. I feel that I’m part of a short historical period of kids who “grew up” with computers, used them in middle school, are perfectly at home with them, but still remember dial-up, the launch of Windows 98, and ordering Linux on CDs because you just couldn’t get it any other way (too young to have access to the resources of a college, only dial-up).

Anyway, on to my point…

As I read about those who stepped before me (and my generation), those who thought up such amazing ideas as Unix, the Internet, networking and most of the software and protocols we have today, I realize how big their shoes are, and how difficult it will be for the next generation to fill them. Sure, we have Facebook, RSS feeds, Web 2.0 and smartphones, but will we be able to innovate on the level that those who came before us did? And then it strikes me how much we young aspiring hackers take for granted. How many aspects of technology today would be seemed impossible 20 years ago, but we use without a second thought.

The last generation of hackers and programmers were raised on software distribution tapes. Their idea of “open” was formed by what they were used to – a Cathedral development model, with regular releases (production, perhaps beta, perhaps even less) and accompanying source code. However, in the pre-Internet days, they were still bound by physical media. They were still bound to the Cathedral development model, to a small and tight-knit group of sages determining when the world was ready to see the fruits of their labor.

The current generation – those of us just out of college or grad school, or even younger – think of Linux as the quintessential open source project. For those of us who came into computing when Linux was already around (I first ran Linux in 2001 when, at 14, I bought the newly-released CD set of SuSE 7.3), Linux sets the bar. It’s what we were raised on (at least in terms of open source). Fixed releases – even with source – seem antiquated, pre-Internet, our fathers’ open source. To us, open means nightly builds, world-readable ticket/bug trackers, anonymous Git or SVN access, and RSS feeds of every commit. It means being able to see every line of code at every moment in time, even if we’ve never e-mailed one of the developers.

Even just a few years ago, the word “open” was used by vendors to mean almost anything – everything from software based on Linux, to software that included source (regardless of the license) to software that just used (patent encumbered) documented protocols or formats. For the next generation, even the generation entering the workforce now, open means much more. It means transparency in development, in code, in documentation, in management.

Many times, I’ve found an “open” software project, and searched their web site endlessly looking for links to Git or SVN or CVS. Or looked endlessly for the (internal) bug tracker. Every time, I had to remind myself that the world, even many of the open source projects, are still far behind my expectations. Even Google’s Android Open Source project only has code merged in periodically from the production (closed) tree, and maintains a separate bug tracker. Far from my expectation of just having some parts of the tree unavailable on the Internet, and some classes of bugs filtered out from public view.

Nobody – not even Microsoft – can deny that the world is moving more and more to open source. It’s already the de-facto standard on the Internet, but it’s moving more and more to the desktop every day. And, as this happens, the expectations of what open means (increasingly more transparent than just “open”) are also increasing. The software world – both proprietary and open source – will have to keep up. And, hopefully, as the generation raised on the Internet begins to fill the ranks of geeks in the workforce, we’ll see more and more open source usage.

As a side note, I’d be very interested to see how open source use compares to demographics. I know that Linux use (on student-owned computers) at most colleges is way above the global average, and the same goes for Firefox.

Subversion ‘is missing or not locked’ error

Recently I was doing some work on a few PHP scripts, and came by a rather annoying error while trying to commit to subversion:

svn: Commit failed (details follow):
svn: Working copy '/srv/www/htdocs/newcall/stats/generated' is missing or not locked

The problem was a directory, “generated”. This particular app makes use of libchart to draw simple charts in PHP. Libchart writes the charts to files, and therefore needs a directory writable by the Apache user. So, I created the generated/ directory for these output files, and chowned wwwrun:www. Now, apparently, the subversion svn add command doesn’t check ownership/writable permissions before adding a directory. So, it added generated/ to the main list of files, but couldn’t write the .svn directory and add a lock. IMHO, this is an error in the svn client.

I couldn’t find any solutions to the problem online. Essentially, I have an empty directory (or at least nothing useful in it) that got partially added to svn – it was added to the .svn/entries file in the parent directory, but never had its own .svn directory created.

The only solution that I found is to manually edit the .svn/entries file in the parent directory. WARNING: this isn’t for the faint of heart. Be sure you don’t screw anything up.

  1. Open the .svn/entries file in the parent directory in a text editor (i.e. if the problem directory is stats/generated, edit stats/.svn/entries
  2. Find the entry node with the correct “name” attribute for the directory in question. For stats/generated, in the stats/.svn/entries file, it should look like:
    <entry
       name="generated"
       kind="dir"
       schedule="add"/>
  3. Make the entries file writable (chmod u+w entries)
  4. Remove the entry from the file.
  5. Set the entries file back to non-writable (chmod u-w entries)
  6. Remove any save files if they were created (i.e. entries~ for emacs)
  7. Remove the directory itself and re-create it, this time adding to svn before setting the ownership.
  8. Commit. It should now work.