Pretty-Print a JSON response at the command line

I’ve been doing some work with RabbitMQ lately, and have been doing some testing against its HTTP-based API, which returns results in JSON. If you’re looking to pretty-print a JSON response for easier viewing, here’s a nice way to do it at the command line using Python and json.tool:

curl http://username:pass@hostname:55672/api/overview | python -m json.tool

Nagstamon on Fedora 17

Since I started my last job, I’ve been using Nagstamon on my workstation; it’s a really handy little system tray application that monitors a Nagios/Icinga instance and shows status updates/summary in a handy fashion, including flashing and (optionally) a sound alert when something changes. Unfortunately, there doesn’t seem to be a Fedora 17 package for it, though there is an entry on the Fedora package maintainers wishlist. The closest I was able to find is a repoforge/RPMforge package of Nagstamon 0.9.7.1, along with a source RPM.

Here are the steps to build that package on F17:

  1. Download and install rpm-macros-rpmforge.
  2. As root, edit /etc/rpm/macros.rpmforge and comment out the %dist macro, so we’ll still have the default “fc17″ dist tag.
  3. wget http://apt.sw.be/source/nagstamon-0.9.7.1-2.rf.src.rpm
  4. rpmbuild –rebuild nagstamon-0.9.7.1-2.rf.src.rpm

Hopefully this will help someone else as well. At the moment, Nagstamon is actually up to version 0.9.9, so hopefully I’ll build a newer package sometime soon.

New Job

Today is my last day in my almost-year-long stint as a System Administrator at TechTarget. Monday, I start a new contract-to-perm position as a Linux Engineer with Cox Media Group Digital & Strategy. I can’t say a whole lot about the new job, other than it will hopefully be a great change for me, and they make heavy use of Django. If you want to get a bit of an idea of what they’re about, here’s a document on their departmental ethos. Hopefully I’ll be able to post more useful information here, and post more often, in the future. I’m really psyched about the new gig.

Some questions from a tech interview with a big Internet company

A while back, I did a technical phone screen with a big online “social” company (I won’t say who, but they’re a household name, growing fast, and doing cool things; that doesn’t leave too many options). I rarely remember to write down interview questions, but I was cleaning out my desk this morning and came by a ripped-out sheet of notebook paper with a handful of the interview questions written on it. Most of them weren’t terribly difficult, or terribly unusual for competent technical interviewers, but since I happen to actually have the list written down, I though I’d share it. I don’t remember why the programming questions are all Python; likely, I was asked to choose between Python (which I’ve used, though not lately), Ruby (which I can barely muddle my way through reading on a good day), and something else I don’t know. Here are some of them…

  • What is an inode? What does it store?
  • What is a hard link?
  • What is the difference between a hard link and a soft link?
  • What is a list in Python?
  • Name some data structures that you’d use in Python. Describe them, and tell me why you would use them.
  • How would you list all the man pages containing the keyword “date”?
  • If the chmod binary had its permissions set to 000, how would you fix it?

Dumping all Macros from an RPM Spec File

I’ve been doing a lot of RPM packaging lately, and on different (and very old) distros and versions. Sometimes I lose track of all of the macros used in specfiles (_bindir _sbindir dist _localstatedir, etc). There’s no terribly easy way to dump a list of all of the available macros. There is, however, a bit of a kludge. Insert the following code in your specfile before the %prep or %setup lines:

%dump
exit 1

The %dump macro will dump all defined macros to STDERR. The exit 1 will prevent rpmbuild from going on and trying to build the package. If you want to view the output nicely, you can pipe it through a pager like less: rpmbuild -ba filename.spec 2>&1 | less.

Just make sure to remove those two lines when you want to actually build the package.

Getting oVirt up and running

The bulk of this post was written way back in April 2012. If you’re just coming here, and looking to setup oVirt, you should probably skip down to the postscript for an update, and ignore most of the content here (as it’s applicable to an older oVirt version).

I recently started setting up oVirt, the community version of Red Hat Enterprise Virtualization, at work for some testing (mainly a “sandbox” VM environment, and because Foreman supports it). To start with, I had two nodes, each with two dual-core Xeon processors (VT-x capable) with 20GB RAM, one with 600GB internal storage and one with 140GB internal. While oVirt’s documentation isn’t exactly wonderful, I found a blgo post by Jason Brooks, How to Get Up and Running with oVirt, which gives a great walkthrough of getting the oVirt Engine setup on a machine, and also setting up that same machine as a VM host. As oVirt is still fairly young, this is all done on Fedora. I performed my installation via Cobbler, though I’m afraid to admit it was an entirely manual, interactive install.

I did run into a few bumps during Jason’s tutorial. In step 15, adding the data NFS export as a Storage Domain, I was unable to add the NFS export. I found the Troubleshooting NFS Storage Issues page on the oVirt wiki, ensured that SELinux was disabled and that the export had the correct permissions, confirmed that /etc/nfsmount.conf specified Nfsvers=3, rebooted, and then ran the nfs-check.py script. At this point, I was able to add the other storage domains in steps 15 and 16.

My second issue was that even on Fedora 16, I simply can’t get the spice client (through the spice-xpi browser plugin) to work. As far as I can tell from the logs, it looks like spicec is being sent a value of “None” for the secured port parameter, instead of the correct port number. I assume this is a bug in oVirt, but I’ll revisit this problem when I have time. In the mean time, I changed my test VM to use VNC, which is launched by installing the ovirt-engine-cli package (see below) on your client computer, connecting to the oVirt API with ovirt-shell:

ovirt-shell --connect --url=https://ovirt-engine.example.com:8443/api --user=admin@internal --password adminpassword

and then running console vm_name. This launches the vncviewer binary, which is in the “tigervnc” package on Fedora.

Installing ovirt-engine-cli

To run ovirt-shell on your workstation (Fedora 16, of course…) you’ll need the ovirt-engine-cli and ovirt-engine-sdk packages. I manually downloaded them from http://www.ovirt.org/releases/nightly/fedora/16/, versions 2.1.3 and 1.6.2, respecitively. The SDK and CLI are python based, so there are a few Python dependencies, all of which were automatically solved by yum. I know there are SDK and CLI packages out there for other distros, but haven’t tried them yet.

Installing Linux Guests

Installing a CentOS 6.2 x86_64 guest was relatively straightforward, and my usual kickstart infrastructure worked fine. The only catch was the VirtIO storage interface, which shows up as /dev/vdx instead of /dev/sdx; I just added another kickstart metadata option in Cobbler that allows me to use sdx by specifying “virtual=yes” (for our VMWare hosts), or vdx by specifying “virtual=ovirt”.

Setting up Authentication

As installed, oVirt only has one user, “admin@internal”; it requires an external directory service for user authentication. Currently, it supports IPA, Red Hat’s Enterprise Identity Management tool (combines RHEL, oVirt Directory Server, Kerberos and NTP; perhaps FreeIPA would work as well?) and Microsoft Active Directory. As much as I’d like to give IPA or FreeIPA a try, my company already has an AD infrastructure, so I opted to go that route. Documentation is given in the oVirt 3.0 Installation Guide, starting on page 96. Unfortunately, I was never about to get AD auth working correctly, so I just worked with the one admin user.

Adding a Node

The biggest issue I had was adding the second node to oVirt. I attempted to use the DVD Import feature of Cobbler on the oVirt Node Image ISO, but that failed. I then found the image’s LiveOS/livecd-iso-to-pxeboot script and used that to make a kernerl and initrd, and kernel parameters, for Cobbler. PXE works fine.

Postscript: I ended up blowing away my oVirt installation in favor of testing other things. At some point, the engine install got corrupted in a way that I just couldn’t fix; even though I spent all day one Saturday working on it, it took more time than I could allocate to a personal project. So this post is really semi-complete at best. However, there is some good news. Jason Brooks’ original post, How to Get Up and Running with oVirt, was written for oVirt 3.0, as was this post. Since then, there has been a new release, oVirt 3.1, which apparently has a better UI and a better installer. Jason Brooks has a new post, Up and Running with oVirt, 3.1 Edition, which covers installation and configuration of both an all-in-one machine and a separate node. If you’re looking to try oVirt, I’d recommend you give that a shot. Unfortunately (and strangely, given that this is supposed to be the “upstream” of RedHat’s proprietary RHEV) it’s still all based on Fedora.

Project – Storing and Analyzing Apache httpd Logs from Many Hosts

I’ve recently started casual work on a side-project to collect, store, and analyze apache logs from a bunch of servers – for the initial implementation, I’m looking to handle about 15M access_log lines per day (that works out to 173 lines/second assuming an even distribution, which there certainly isn’t). Here is a selection of links that I’ve been using for ideas and inspiration, both for the technical side (data collection, transport, storage and analysis) and visualization:

  • RRDtool – RRDtool Gallery – I’m starting a graphing/log analysis project, and looked here for some inspiration for my proof-of-concept code
  • Creating pretty graphs with RRDTOOL from Girish Venkatachalam.
  • There’s some good information on RRDtool’s “Abberant Behavior Detection” (Holt-Winters prediction, deviation and failure detection) on the rrdtool, rrdgraph_examples and rrdcreate documentation pages, but unfortunately no anchors to link directly to.
  • Cube – “Cube is a system for collecting timestamped events and deriving metrics. By collecting events rather than metrics, Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.”
  • Cubism.js – “Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards, pulling data from Graphite, Cube and other sources. Cubism is available under the Apache License on GitHub.” The demo on that page looks pretty cool.
  • Highcharts Demo Gallery – JS chart/graph library. It requires a paid license for commercial use (though it’s a bit unclear to me whether an internal ops dashboard would fall under this license provision) so I probably wouldn’t go with this one. They have some cool charts, including a dynamic line chart updating every second, a scatter plot and a nice zoomable time-series graph, though IMHO it’s not as nice as the Google Chart Tools (formerly Google Visualization) annotated timeline.
  • [ HOWTO ] Graphing Holt-Winters Predictive Analysis – Cacti forums
  • dygraphs – an impressive permissive-license JS chart library dedicated to visualizing dense time-series data. Developed by Google and now used by them (Google Correlate, Google Latitude) as well as NASA, 10gen and others. There are some very cool demos on that main page, and also on the tests page.
  • Graphite, JMXTrans, Ganglia, Logster, Collectd, say what ? « Planet DevOps
  • Visage
  • kgorman/mongo_graph – a tool to pull data from MongoDB and put it in RRD files
  • drraw – a perl-based graphing frontend (web UI) for RRDtool
  • etsy/logster · GitHub – Etsy’s Python tool to maintain a pointer on a log file, and parse at a regular rate feeding the data into a tool like Graphite or Ganglia.
  • cebailey59/charcoal – a Sinatra app that allows creation of dashboards from Graphite, collectd, or any other service that creates images from URL calls.
  • etsy/dashboard – some examples of how Etsy builds monitoring dashboards.
  • GDash – Graphite Dashboard | R.I.Pienaar – a Sinatra dashboard app for Graphite, using Twitter bootstrap for visualization.
  • paperlesspost/graphiti – a Ruby and JavaScript front-end for Graphite.
  • Graphite Screenshots – just two, but they get the idea across pretty well.
  • Graylog2 – a centralized log management application with a powerful web interface. Stores logs in ElasticSearch (which is built on Lucene, a Java-based index and search server) and statistics/graphs in MongoDB. It does analytics, alerting, monitoring/graphing and searching all through a web interface, and accepts log data via syslog, AMQP and GELF (its own log format). Java server and Ruby on Rails web UI.
  • Logstash – another centralized log project that stores and indexes logs, with search via a web UI. “Ship any event to anywhere over any protocol.” Takes many inputs including files, syslog, AMQP, Flume, STOMP, HTTP and even twitter, performs a number of filters including timestamp checks, parsing, dropping, joins, etc, and then sends logs back on an output including AMQP, Graylog2 GELF, STOMP, MongoDB, ElasticSearch, syslog, WebSockets and to Nagios. One particularly cool feature is its “file” input, which continuously tails a file and claims to be log rotation safe. Just cool.
  • jordansissel’s Logstash intro slides.
  • Kibana – an alternative interface for Logstash and ElasticSearch that allows searching, graphing and analysis of log data stored in Logstash.
  • Pivotal Labs: Talks – Metrics Metrics Everywhere (Coda Hale)
  • PaperlessPost – @quirkey’s talk on metrics – very good high level stuff, but slides only
  • paperlesspost/graphiti – graphiti, a JS/Ruby frontend for Graphite that does graphs, dashboards, and point-in-time snapshots of graphs. Lots of functionality.
  • Redis – a distributed key/value store that’s really popular with the cool kids. Another Redis Use Case: Centralized Logging • myNoSQL
  • Charcoal – a Sinatra (Ruby) dashboard app (ready for use on Heroku but usable anywhere). Graphite-oriented but will work with any tool that generates images from URLs.
  • etsy/logster – etsy’s Logster tool, which keeps a tail on log files, parses them, and ships metrics to Graphite or Ganglia.

Some PowerDNS Links and Interesting Features

At $WORK we lost a disk in the RAID1 of one of our external nameservers, and it rekindled an occasional discussion of migration from ISC BIND to PowerDNS. PowerDNS has separate authoritative and recursive servers, and doesn’t seem to natively support views or split-horizon the way BIND does, but it has some really cool features including very mature database backends, load balancing, Lua scripting support to modify how recursive queries are answered, and geolocation or IP-range based query results.

While this project is still just casual research, I thought I’d share some of the useful links and information I’ve found:

PowerDNS Front-ends:

  • JPowerAdmin – One of the two most popular, a GPLv3 Java (JBoss SEAM) based web UI with a RESTful API, with support for “multiple” database backends. Sponsored by Nicmus, Inc. Online demo (demo:demo). Looks nice, simple UI, but no support for split-horizon.
  • PowerAdmin – the other most popular, though it seems to be undergoing a large overhaul at the moment. Has full support for most of PowerDNS’s features, written in PHP, supports “large” databases, fine-grained user permissions, RFC validation, zone templates. Online demo (demo:demo). I don’t really like that it manages the SOAs as full text (without any templating, dropdowns or default values), and that it doesn’t prepopulate default values for TTL in the new record form, but it looks like a good starting place for someone (like me) who’s handy with PHP.
  • pdns-gui – PowerDNS GUI – Google Project Hosting – PHP/MySQL GUI. Online demo. Handles templates nicely but won’t scale to too many of them. Window-based UI is visually pleasing but will probably be a problem for big zones.
  • powerdns-webinterface – PowerDNS Webinterface – Google Project Hosting – A nice but relatively simplistic UI written in PHP. It has some nice features like multi-user authentication (and logging, though I haven’t looked into how detailed it is), automatic SOA serial update, automatic PTR creation, etc. Unfortunately not geared towards people with lots of domains and multiple records; it has only one template for new domains (and no way to update domains created from a template), no easy filtering, and still treats SOA like a single text record.
  • ZoneAdmin | SourceForge.net and Project website – Maybe not the fastest tool to use in bulk, but a nice, relatively intuitive and full-featured admin tool. Online demo (demo:demo).

Some links on PowerDNS split-horizon

It looks to me that split-horizon is going to be the hardest part for us, at least to also have a web UI to manage it. It looks like with PowerDNS, the most common way to run split horizon DNS (views) is to run two separate sets of servers or instances, either on different boxes or multi-homed; one for internal and one for external. While that sounds like quite a bit of overhead beyond what BIND does, the real problem is finding a web UI that supports it; I don’t care if it’s in two separate databases, but what I want is a logical (web UI) view that has zones made up of resource names (i.e. the leftmost column in a zone file) with one or two RRs (type, ttl, priority, value) – one for each view. That’s the real catch – all of our machines are in private IP space behind a firewall, so I need to be able to manage the internal and external records on one screen. While it’s not exactly scalable, and the code stagnated quite a bit once I got it to a point that was usable for me, this was the main goal of my MultiBIND Admin project.

WordPress – Automatically publish a pending post each weekday morning from a PHP script

In an earlier post, Piwik Web Analytics, and some unfortunate stats about my blog, I mentioned that the Feedburner stats for this blog show a relatively high subscribe/unsubscribe rate for this blog. I think a large part of that is my tendency to blog in spurts, and even worse, my tendency to write drafts and not publish them. In an effort to combat this, I’ve been trying to finish blog posts and then set them to “Pending” status, and go back and publish one every day (well, every day that I have some still sitting unpublished). Of course, that counts on me logging in to WordPress every day, which isn’t something I do. The following script is, at least for now, the answer for me.

This script (a standalone PHP script) uses wp-load.php to load the wordpress environment, and then finds the oldest post with a given status (“pending” in my case) and attempts to publish it. It only does this if there has not been another post published in the last 24 hours. The following script can be found in subversion at http://svn.jasonantman.com/misc-scripts/wordpress_daily_post.php:

#!/usr/bin/php
<?php
/**
 * wordpress_daily_post.php
 * Script to publish the oldest post with a given status, if no
 * other post has been published in 24 hours. Intended to be run
 * via cron on weekdays.
 *
 * Copyright 2012 Jason Antman 
 *
 * Licensed under the Apache License, Version 2.0 
 *
 * use it anywhere you want, however you want, provided that this header is left intact,
 * and that if redistributed, credit is given to me.
 *
 * It is strongly requested, but not technically required, that any changes/improvements
 * be emailed to the above address.
 *
 * The latest version of this script will always be available at:
 * $HeadURL: http://svn.jasonantman.com/misc-scripts/wordpress_daily_post.php $
 * $LastChangedRevision: 40 $
 *
 * Changelog:
 * 2012-09-03 Jason Antman  - 1.0
 *  - first version
 */
 
# BEGIN CONFIGURATION
define('WP_LOAD_LOC', '/var/www/vhosts/blog.jasonantman.com/wp-load.php'); // Configure this to the full path of your Wordpress wp-load.php
define('SOURCE_POST_STATUS', 'pending'); // post status to publish
# END CONFIGURATION

$VERBOSE = false;
$DRY_RUN = false;
array_shift($argv);
while(count($argv) > 0) {
  if(isset($argv[0]) && $argv[0] == "-d" || $argv[0] == "--dry-run"){
    $DRY_RUN = true;
    fwrite(STDERR, "DRY RUN ONLY - NOT ACTUALLY PUBLISHING.\n");
  }
  if(isset($argv[0]) && $argv[0] == "-v" || $argv[0] == "--verbose"){
    $VERBOSE = true;
    fwrite(STDERR, "WP_LOAD_LOC=".WP_LOAD_LOC."\n");
    fwrite(STDERR, "SOURCE_POST_STATUS=".SOURCE_POST_STATUS."\n");
  }
  array_shift($argv);
}
 
$_SERVER['HTTP_HOST'] = 'localhost'; // needed for wp-includes/ms-settings.php:100
require_once(WP_LOAD_LOC);
 
# check that we're running on a weekday
if(date('N') >= 6) {
#  if($VERBOSE){ fwrite(STDERR, "today is a saturday or sunday, dieing.\n"); }
#  exit(1);
}
 
# find the publish date/time of the last published post
$published = get_posts(array('numberposts' => 1, 'orderby' => 'post_date', 'order' => 'DESC', 'post_status' => 'publish'));
$post = $published[0];
$pub_date = $post->post_date;
$pub_id = $post->ID;
 
if(strtotime($pub_date) >= (time() - 86400)) {
  if($VERBOSE){ fwrite(STDERR, "last post (ID $pub_id) within last day ($pub_date). Nothing to do. Exiting.\n"); }
  exit(0);
} else {
  if($VERBOSE){ fwrite(STDERR, "Found last post (ID $pub_id) with post date $pub_date.\n"); }
}
 
 
# find the earliest post of status SOURCE_POST_STATUS, if there is one.
$to_post = get_posts(array('numberposts' => 1, 'orderby' => 'post_date', 'order' => 'ASC', 'post_status' => SOURCE_POST_STATUS));
if(count($to_post) ID;
$to_pub_date = $post->post_date;
$to_pub_title = $post->post_title;
$now = time();
$new_date = date("Y-m-d H:i:s", $now);
$new_date_gmt = gmdate("Y-m-d H:i:s", $now);
 
if($VERBOSE){ fwrite(STDERR, "Post to publish: ID=$to_pub_id DATE=$to_pub_date NEW_DATE=$new_date TITLE=$to_pub_title\n"); }
 
# actually publish it
if(! $DRY_RUN){
  $arr = array('ID' => $to_pub_id, 'post_status' => 'publish', 'post_date' => $new_date, 'post_date_gmt' => $new_date_gmt);
  $ret = wp_update_post($arr); // publish the post
  if($ret == 0) {
    fwrite(STDERR, "ERROR: Post $to_pub_id was not successfully published.");
    exit(1);
  }
  if($VERBOSE){ fwrite(STDERR, "Published post. New ID: $ret\n"); }
}
else {
  fwrite(STDERR, "Dry run only, not publishing post.\n");
}
 
# check that the post really was published
$published = get_posts(array('numberposts' => 1, 'orderby' => 'post_date', 'order' => 'DESC', 'post_status' => 'publish'));
$post = $published[0];
$pub_date = $post->post_date;
$pub_id = $post->ID;
$pub_title = $post->post_title;
$pub_guid = $post->guid;
 
if($pub_title != $to_pub_title) {
  fwrite(STDERR, "ERROR: title of most recent post does not match title of what we wanted to post.");
  exit(1);
}
 
fwrite(STDOUT, "Published post $pub_id at $pub_date\n");
fwrite(STDOUT, "Title: $pub_title\n");
fwrite(STDOUT, "\n\n\n GUID/Link: $pub_guid\n");
fwrite(STDOUT, "\n\n".__FILE__." on ".trim(shell_exec('hostname --fqdn'))." running as ".get_current_user()."\n");
 
?>

You’ll need to set WP_LOAD_LOC (line 29) to the full path of your WordPress installation’s wp-load.php (it should be in the top-level directory of your WordPress installation. I run this script from cron like:

0 6 * * 1-5 /home/jantman/bin/wordpress_daily_post.php --verbose # publish WP pending posts daily

so that it runs at 6AM (local time) each weekday. Assuming you have cron setup to send you mail, you’ll get a daily message saying what was (or wasn’t) done.

Interesting Systems Links for September 3, 2012

Here is a small selection of sysadmin links that I recently found, and wanted to share: