Information, thoughts, tips and tricks from a professional system administrator / monitoring and automation nut, open source junkie, troubleshooter, sometime software developer and ceaseless tinkerer. And some occasional commentary on my hobbies and non-tech interests.
I’ve been deploying some new software lately (specifically selenesse, which combines Selenium and fitnesse, xvfb). None of these seem to come with init scripts to run as daemons, and the quality of the few Fedora/RedHat/CentOS init scripts I was able to find was quite poor. The Fedora project has a Specification for SysV-style Init Scripts in their Packaging wiki, which specifies what a Fedora/RedHat/CentOS init script should look like, in excruciating detail. What follows is an overview of the more important points, which I’m using to develop or modify the scripts I’m currently working on.
Scripts must be put in /etc/rc.d/init.d, not in the /etc/init.d symlink. They should have 0755 permissions.
Scripts must make use of a lockfile in /var/lock/subsys/, and the name of the lockfile must be the same as the name of the init script. (There is a technical reason for this relating to how sysv init terminates daemons at shutdown). The lockfile should be touched when the daemon successfully starts, and removed when it successfully stops.
Init scripts should not depend on any environment variables set outside the script. They should operate gracefully with an empty/uninitialized environment (or only LANG and TERM set and a CWD of /, as enforced by service(8), or with a full environment if they are called directly by a user.
Required actions – all of the following actions are required, and have specific definitions:
start: starts the service
stop: stops the service
restart: stop and restart the service if the service is already running, otherwise just start the service
condrestart (and try-restart): restart the service if the service is already running, if not, do nothing
reload: reload the configuration of the service without actually stopping and restarting the service (if the service does not support this, do nothing)
force-reload: reload the configuration of the service and restart it so that it takes effect
status: print the current status of the service
usage: by default, if the initscript is run without any action, it should list a “usage message” that has all actions (intended for use)
They must “behave sensibly”. I’ve found this to be one of the biggest problems with homegrown init scripts. If servicename start is called while the service is already running, it should simply exit 0. Likewise if the service is already stopped. Init scripts must not kill unrelated processes. I don’t know how many times I’ve seen scripts that kill every java or python process on a machine.
I intend to use this as a quick checklist when developing or evaluating init scripts for RedHat/Fedora based systems. In my experience, the biggest problems with most init scripts revolve around poor handling of PID files and lockfiles, mainly:
Killing processes other than the one that the script started (i.e. killing all java or python processes), usually because the PID isn’t tracked at start
Starting a second instance of the subsystem because lockfiles aren’t used, or the status function is broken.
improper exit codes
either explicitly relying on environment variables (and therefore breaking when called through service(8)), or conversely, not cleaning/resetting environment variables that are used by dependent code or processes.
If you’re like me and most humans, the Nagios logfile timestamp (a unix timestamp) isn’t terribly useful when trying to grep through the logs and correlate events: # head -2 nagios.log  LOG ROTATION: DAILY  LOG VERSION: 2.0
Here’s a nifty Perl one-liner that you can pipe your logs through: perl -pe ‘s/(\d+)/localtime($1)/e’ to get nicer output like: # head -2 nagios.log [Tue Oct 16 00:00:00 2012] LOG ROTATION: DAILY [Tue Oct 16 00:00:00 2012] LOG VERSION: 2.0
On the lighter side, I found a few web sites by Tom Blackwell that do some fun stuff with text overlays on images. seems like a nice little tool for those end-of-project powerpoints, or to send out the monthly “most rolled-back commits” medal…
I saw a link to this YouTube video shared on Tom Limoncelli’s blog. It’s a 1953 US Navy instructional video about an all-mechanical fire control computer. Yes, I really mean a computer that can solve continuously changing 25-variable fire control problems using only mechanical means (gears, cams, etc.). Think about it for a minute – it’s truly mind-boggling. And really gives one an amazing appreciation for the power of a simple pocket calculator, and the amazing engineering that went into solving these problems before electronic computers. I’m usually not much of a math geek, but I watched the whole 40 minute video and was in awe of both the simple ability to use three arms and a pin to multiply numbers, and the amazingly precise engineering and machining it would take to translate various rotation inputs into landing a shell on a moving ship miles away. It’s a really good watch, and will probably leave you astonished by both how far technology has come (and what we take for granted every day), and by the fact that feats of engineering like this one worked quite well.
I’ve been doing some work with RabbitMQ lately, and have been doing some testing against its HTTP-based API, which returns results in JSON. If you’re looking to pretty-print a JSON response for easier viewing, here’s a nice way to do it at the command line using Python and json.tool: curl http://username:pass@hostname:55672/api/overview | python -m json.tool
Since I started my last job, I’ve been using Nagstamon on my workstation; it’s a really handy little system tray application that monitors a Nagios/Icinga instance and shows status updates/summary in a handy fashion, including flashing and (optionally) a sound alert when something changes. Unfortunately, there doesn’t seem to be a Fedora 17 package for it, though there is an entry on the Fedora package maintainers wishlist. The closest I was able to find is a repoforge/RPMforge package of Nagstamon 0.9.7.1, along with a source RPM.
Today is my last day in my almost-year-long stint as a System Administrator at TechTarget. Monday, I start a new contract-to-perm position as a Linux Engineer with Cox Media Group Digital & Strategy. I can’t say a whole lot about the new job, other than it will hopefully be a great change for me, and they make heavy use of Django. If you want to get a bit of an idea of what they’re about, here’s a document on their departmental ethos. Hopefully I’ll be able to post more useful information here, and post more often, in the future. I’m really psyched about the new gig.
A while back, I did a technical phone screen with a big online “social” company (I won’t say who, but they’re a household name, growing fast, and doing cool things; that doesn’t leave too many options). I rarely remember to write down interview questions, but I was cleaning out my desk this morning and came by a ripped-out sheet of notebook paper with a handful of the interview questions written on it. Most of them weren’t terribly difficult, or terribly unusual for competent technical interviewers, but since I happen to actually have the list written down, I though I’d share it. I don’t remember why the programming questions are all Python; likely, I was asked to choose between Python (which I’ve used, though not lately), Ruby (which I can barely muddle my way through reading on a good day), and something else I don’t know. Here are some of them…
What is an inode? What does it store?
What is a hard link?
What is the difference between a hard link and a soft link?
What is a list in Python?
Name some data structures that you’d use in Python. Describe them, and tell me why you would use them.
How would you list all the man pages containing the keyword “date”?
If the chmod binary had its permissions set to 000, how would you fix it?
I’ve been doing a lot of RPM packaging lately, and on different (and very old) distros and versions. Sometimes I lose track of all of the macros used in specfiles (_bindir _sbindir dist _localstatedir, etc). There’s no terribly easy way to dump a list of all of the available macros. There is, however, a bit of a kludge. Insert the following code in your specfile before the %prep or %setup lines:
The %dump macro will dump all defined macros to STDERR. The exit 1 will prevent rpmbuild from going on and trying to build the package. If you want to view the output nicely, you can pipe it through a pager like less: rpmbuild -ba filename.spec 2>&1 | less.
Just make sure to remove those two lines when you want to actually build the package.