Project – Storing and Analyzing Apache httpd Logs from Many Hosts

I’ve recently started casual work on a side-project to collect, store, and analyze apache logs from a bunch of servers – for the initial implementation, I’m looking to handle about 15M access_log lines per day (that works out to 173 lines/second assuming an even distribution, which there certainly isn’t). Here is a selection of links that I’ve been using for ideas and inspiration, both for the technical side (data collection, transport, storage and analysis) and visualization:

  • RRDtool – RRDtool Gallery – I’m starting a graphing/log analysis project, and looked here for some inspiration for my proof-of-concept code
  • Creating pretty graphs with RRDTOOL from Girish Venkatachalam.
  • There’s some good information on RRDtool’s “Abberant Behavior Detection” (Holt-Winters prediction, deviation and failure detection) on the rrdtool, rrdgraph_examples and rrdcreate documentation pages, but unfortunately no anchors to link directly to.
  • Cube – “Cube is a system for collecting timestamped events and deriving metrics. By collecting events rather than metrics, Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.”
  • Cubism.js – “Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards, pulling data from Graphite, Cube and other sources. Cubism is available under the Apache License on GitHub.” The demo on that page looks pretty cool.
  • Highcharts Demo Gallery – JS chart/graph library. It requires a paid license for commercial use (though it’s a bit unclear to me whether an internal ops dashboard would fall under this license provision) so I probably wouldn’t go with this one. They have some cool charts, including a dynamic line chart updating every second, a scatter plot and a nice zoomable time-series graph, though IMHO it’s not as nice as the Google Chart Tools (formerly Google Visualization) annotated timeline.
  • [ HOWTO ] Graphing Holt-Winters Predictive Analysis – Cacti forums
  • dygraphs – an impressive permissive-license JS chart library dedicated to visualizing dense time-series data. Developed by Google and now used by them (Google Correlate, Google Latitude) as well as NASA, 10gen and others. There are some very cool demos on that main page, and also on the tests page.
  • Graphite, JMXTrans, Ganglia, Logster, Collectd, say what ? « Planet DevOps
  • Visage
  • kgorman/mongo_graph – a tool to pull data from MongoDB and put it in RRD files
  • drraw – a perl-based graphing frontend (web UI) for RRDtool
  • etsy/logster · GitHub – Etsy’s Python tool to maintain a pointer on a log file, and parse at a regular rate feeding the data into a tool like Graphite or Ganglia.
  • cebailey59/charcoal – a Sinatra app that allows creation of dashboards from Graphite, collectd, or any other service that creates images from URL calls.
  • etsy/dashboard – some examples of how Etsy builds monitoring dashboards.
  • GDash – Graphite Dashboard | R.I.Pienaar – a Sinatra dashboard app for Graphite, using Twitter bootstrap for visualization.
  • paperlesspost/graphiti – a Ruby and JavaScript front-end for Graphite.
  • Graphite Screenshots – just two, but they get the idea across pretty well.
  • Graylog2 – a centralized log management application with a powerful web interface. Stores logs in ElasticSearch (which is built on Lucene, a Java-based index and search server) and statistics/graphs in MongoDB. It does analytics, alerting, monitoring/graphing and searching all through a web interface, and accepts log data via syslog, AMQP and GELF (its own log format). Java server and Ruby on Rails web UI.
  • Logstash – another centralized log project that stores and indexes logs, with search via a web UI. “Ship any event to anywhere over any protocol.” Takes many inputs including files, syslog, AMQP, Flume, STOMP, HTTP and even twitter, performs a number of filters including timestamp checks, parsing, dropping, joins, etc, and then sends logs back on an output including AMQP, Graylog2 GELF, STOMP, MongoDB, ElasticSearch, syslog, WebSockets and to Nagios. One particularly cool feature is its “file” input, which continuously tails a file and claims to be log rotation safe. Just cool.
  • jordansissel’s Logstash intro slides.
  • Kibana – an alternative interface for Logstash and ElasticSearch that allows searching, graphing and analysis of log data stored in Logstash.
  • Pivotal Labs: Talks – Metrics Metrics Everywhere (Coda Hale)
  • PaperlessPost – @quirkey’s talk on metrics – very good high level stuff, but slides only
  • paperlesspost/graphiti – graphiti, a JS/Ruby frontend for Graphite that does graphs, dashboards, and point-in-time snapshots of graphs. Lots of functionality.
  • Redis – a distributed key/value store that’s really popular with the cool kids. Another Redis Use Case: Centralized Logging • myNoSQL
  • Charcoal – a Sinatra (Ruby) dashboard app (ready for use on Heroku but usable anywhere). Graphite-oriented but will work with any tool that generates images from URLs.
  • etsy/logster – etsy’s Logster tool, which keeps a tail on log files, parses them, and ships metrics to Graphite or Ganglia.

Vyatta NetworkOS router/firewall on Alix board / Compact Flash

With the impending move to an apartment in Georgia and the migration of my rack full of servers to a hosting provider, there’s no longer a need for me to run my Vyatta VC router on a beefy dual-CPU RAIDed DL360 G3 HP Proliant server chassis. I found an older PCEngines Alix 2c1 single board computer (433 MHz AMD Geode LX700 , 128MB DDR DRAM, CompactFlash (CF) card socket, MiniPCI, 3x 10/100 ethernet) lying around, and decided to turn that into the new router. But I’ve been so spoiled by Vyatta’s good performance (well, at least on an x86 server) and the real CLI, so I don’t think I can go back to something like m0n0wall or pfSense, and since it’s going to be my only network services box (also doing DNS, DHCP, firewalling, NAT, and maybe IPsec VPN) it’s not viable to use the type of older Cisco or Juniper hardware that I can afford.

The down side is that Vyatta isn’t really designed or tuned for small systems, let alone CF media that doesn’t take too well to lots of writes. So, I’m going to begin experimentation with doing a CF install of the current Vyatta Core 6.3, and we’ll see how it goes and what tuning I do over time.

I found two relatively good references; a post on the vyatta.org forum from 2008, relating to Vyatta version 4 (also on the author’s blog), and a blog post detailing a more complex SquashFS/tmpfs/UnionFS read-only Vyatta install. Given my relatively short timeframe and little free time, I decided to try the former approach for now, and plan to make a more customized and tuned CF version of Vyatta in the future.

Creating the actual disk image:

My development platform at the moment is an intel-based MacBook Pro, running MacOS X 10.6.4 and VirtualBox 4.0.12. As much of a Linux fan as I am, my work laptop runs Mac (like everyone else in the office) and lately I can’t guarantee that I’ll be at my desktop long enough to finish anything. The target is an Alix2c1 with a 2GB SanDisk Ultra CF card (yes, I know an industrial card would be better, but I couldn’t get my hands on one). For starters, I created a new VirtualBox VM with the following settings:

  • OS Type: Linux 2.6
  • Base Memory: 128MB
  • Boot Order: Floppy, CD-ROM, Hard Disk
  • IDE Controller Primary: mounted vyatta-livecd_VC6.3 ISO image
  • IDE Controller Secondary: RAW VMDK image (created below)
  • Audio: None
  • Network: Disabled (this is important, as Vyatta saves the interfaces by hardware address, and it would require some config editing and reboots to change them)
  • Serial Port: disconnected (but present)

One difficulty I ran into on Mac is mounting the raw CF card in the VirtualBox guest. I plugged it in via a USB reader, and of course it automatically mounted in MacOS. I ejected it and the /dev/disk1 device disappeared. It turns out that the full procedure (as far as I could tell) for Mac is:

  • Plug in the CF card and reader.
  • It should automount. Run mount to see what the actual device is – in my case, the /dev/disk1s1 partition was mounted, so the disk is /dev/disk1.
  • Run sudo umount -f /dev/disk1. It seems that the MacOS automounter has a god complex, so you may need to re-run this command quite a few times throughout the process if you get device or resource busy errors.
  • In an appropriate directory, create the raw VMDK image with: VBoxManage internalcommands createrawvmdk -filename rawdisk.vmdk -rawdisk /dev/disk1.
  • When creating your VM, you’ll have an option to select Use an Existing Virtual Disk. Use that option, and select the file created in the last step.

Once that’s done, and you’ve setup the VM with the raw disk, boot the VM (should boot to the Vyatta LiveCD), login as usual for an install (vyatta:vyatta), and the the fun begins:

  1. At the prompt after logging in, sudo su -
  2. Edit /opt/vyatta/sbin/install-system (hint: Vyatta has nano and vi installed. nano -c filename shows line numbers) and change the ROOT_FSTYPE variable (line 78 in VC6.3) from “ext4″ to “ext2″.
  3. Run install-system. I used all default options (including one partition) and it seemed to work fine. It took a minute or two to create the ext2 filesystem on my 2GB CF card.
  4. The file copy took even longer… so be patient, or have a book handy.
  5. When system-install finishes and you get the root prompt back, before rebooting, continue with some minor tweaks:
  6. mkdir /mnt/temp
  7. mount /dev/sda1 /mnt/temp
  8. cd /mnt/temp
  9. Edit boot/grub/grub.cfg and change all occurrences of “root=UUID=…” entries for the “linux” lines (lines 13, 18, 23, 28 in my grub.cfg) to “root=/dev/sda1″. My only real reason for this change is so that I can move my altered config files (config.boot, fstab and grub.cfg) with minimal changes when I upgrade or make a different vyatta CF card, without having to update the UUID for the new partition.
  10. Edit etc/fstab and change the “UUID=…” device to “/dev/sda1″.
  11. shutdown. Once the VM is stopped, you can remove the CF card.
  12. The PCEngines Alix.2 boards use a default serial console speed of 38400 baud. Pretty much every network device, plus Linux and Vyatta, use a default speed of 9600 baud. Once I got the CF card installed in the Alix board and hooked it up to my laptop (null modem cable to a PL-2303 based USB to serial adapter, minicom for terminal emulation), I set my terminal emulator to 38400 8N1, powered the board, and then pressed ‘s’ during POST to get into BIOS settings. Option ’9′ sets the Alix to 9600 baud, ‘Q’ to quit, and ‘Y’ to save changes permanently. The board will reboot, and once the terminal emulator is set back to 9600 baud, serial console should work fine both in BIOS and in the OS.

If all worked well, you should be able to boot into Vyatta and login as the default “vyatta” user (which you set a password for during the install). Assuming you know your way around Vyatta, it’s pretty standard from here, though there are a few things you may want to check or configure right away:

  1. In configuration mode (configure) run show interfaces. All of your physical ethernet interfaces should appear, along with their MAC addresses.
  2. Some changes to reduce the number of log writes to the CF card: set system syslog console facility all level notice and set system syslog global facility protocols level notice.
  3. Configure interfaces. with firewalls, IP addresses or DHCP, etc.
  4. Do whatever other configuration you need for a minimal system – dhcp, dns, nat, etc.

And that’s it – this should give you a working Vyatta system on CF on an Alix board. Stay tuned, hopefully in a month or so I’ll get around to customizing it a bit more, based on the second blog entry linked above.

Downtime Last Night; Coping with outages on a shoestring budget

Anyone who tried to visit any of my sites last night, or send me email, probably noticed that I dropped off the face of the earth. I take my uptime pretty seriously – even with virtually no budget, about 10 minutes of UPS time, and everything hosted out of my basement. I’ve only had about a 6-hour block of downtime in the past 21 months, aside from that nothing externally visible except a few sub-3-minute hiccups. And the 6 hours was due to a major power outage that quickly overwhelmed my UPS. Without a generator, I can’t really blame anyone for that. Anyway, last night into this morning I had partial to full outages for about 12 hours. By far the most I’ve had since I literally ripped apart my entire infrastructure, moved it from shelves to a rack, and put it back together on the fly.

I’d like to apologize to the few people whose web sites or other services I host out of my house. On one hand, I think those few people are getting their money’s worth from their free hosting :) On the other hand I know how frustrating it can be, especially when you can’t even resolve DNS and my email is down. Rest assured that this situation deeply bothers me, and I’m already hard at work on plans to at least keep people (both the people I do hosting for and their visitors) informed if something like this should happen in the future.

The Story:

Well, yesterday afternoon as I was at my father’s house about an hour away, I started to get a slew of Nagios notification SMSes (well, email to SMS). They started at 14:22 and stopped after a few minutes, but since I was driving, I didn’t check them. When I got home, I found a scenario I hadn’t really anticipated – the TV worked fine, but I had virtually no connectivity on my cable Internet line. I got home around 16:22 and had spotty-at-best connectivity. I could get some DNS in and out, but it was pretty much impossible to load a web page on my desktop. I was getting Nagios alerts out in bursts, and the postfix queue was pretty full. A cursory inspection showed both routers (mine and Optimum’s) online with link, the Modem had link and was blinking away, but I was passing < 100 Kbps of data. My DOCSIS-MIB checks on the cable modem were spitting back all sorts of bad values. Not good. I checked coax connections, rebooted the modem and router, and went outside to inspect the aerials and the splitters on the outside of the house. Nothing visibly wrong, and no positive change after the reboot. Now the modem wasn't showing link at all, and I couldn't even ping its LAN IP.

I called Optimum at 17:12 and went through the initial troubleshooting with the technical support guy. I told him I’d already power-cycled the modem, and gave him a rundown of the status lights. He confirmed that they couldn’t even see the modem on the WAN side, and would have to send a tech out. The big plus to Optimum Business is that, despite it being after 5 PM on a Sunday night, I was given an ETA of 2-4 hours. After about 15 minutes, I power-cycled the modem again, and was able to get link. I was seeing some data pushed through the routers, but only about 50Kbps. I called Optimum back, spoke with another tech, and was told that they couldn’t even get diagnostics back from the modem, and were seeing 94% packet loss on ping. Time to wait for the tech.

The field tech arrived at 17:53. Utterly amazing… about 30 minutes after I got off the phone with tech support. I don’t know if they keep their better techs waiting around for business customers, but this guy – Jason – was one of the most knowledgeable and experienced that I’ve ever met. He poked around the modem a bit, re-did some of the shoddy work that the original installers left, and then climbed the pole to figure out what was going on. About half an hour later, he came back with the bad news. His test scope wouldn’t even lock on to the 609MHz carrier used for the cable modem, so there was something definitely wrong, and it was past the pole in front of the house. He told me they’d need to escalate the problem to the outside plant engineers, but since I was a business customer, I could expect some update or fix in 6-18 hours. He left around 18:30. Well, I was bummed, but I used the time to get other stuff done and start planning for at least minimal DR plans for the future.

According to my off-site Nagios, I at least got some mail out and SSH in from 19:55 to 20:44, and then had another total loss of connectivity. Everything came back around 02:20 today, meaning a full 12 hours of downtime.

Analysis and Future Plans:

Well for the foreseeable future I’m just working my day job and probably not doing much (paid) consulting, so purchasing a backup connection is out of the question – especially since FiOS charges almost twice what Optimum does for static IP service. There’s really no way I could’ve prevented this outage, and it turns out that the problem wasn’t even on my property, so it’s not anything I could have fixed myself (or prevented by convincing Optimum to let me purchase a spare modem to keep on hand). Once again, for something that isn’t directly money-making for me, it’s not really worth it to try and get hosting as a backup, since I’ve got all sorts of complex postfix configurations, BIND master/slave replication, etc. Within my budget, I can’t really say there’s anything I could do to solve this problem, or to get even half of my services back up. My offsite Nagios is behind a dynamic residential cable connection, so that won’t really fix any problems either.

My plan for the short-term is to find a static IP somewhere that I can run a box behind, add it as a NS record, and at the minimum setup a caching Postfix server, a catch-all Apache server with a “we know about it, we’re coming back soon” page, and hacked BIND zone files that point everything at this one box (albeit with a low TTL).

If anyone out there happens to read this, any comments on how to deal with a total loss of connectivity on a budget of, say, $15/month above the cost of my Optimum connection??

Virtualization Options

As I mentioned in Downtime past few days, coping with storms, as a result of some things I noticed with a recent power outage, I’ve decided to take the leap to virtualization. Given the cost of current hardware that supports HVM (Intel VT-x or AMD-V ), I immediately decided that I might as well give up on any thoughts of doing full virtualization or getting new-ish hardware. So I settled on the next step up from what have now – a set of HP Proliant DL360 G3 servers. I got them with a 90 day warranty from a reputable dealer, dual 2.8GHz Xeon (512K cache), 2Gb RAM, dual 36.4Gb U320 15k RPM SCSI disks and dual power supplies for $99 each. My next step is to decide what virtualization software to use.

My main goals for the project are:

  • Lower power consumption through consolidation of servers.
  • Possibility to add capacity or resources by remotely powering up an idle server and migrating VMs to it.
  • Limited fault tolerance – ability to manually restore a VM that was running on failed hardware, onto an idle server.

I originally thought Xen, just out of reflex. However, given that all of my servers have the same base – the same distribution and, ideally, the same kernel and patch level – it seemed like a lot of overhead to duplicate that for multiple VMs. So I started looking into OS-level virtualization. There are relatively few options, and I’ll admit that aside from Solaris Containers (which I learned about while working at Sun) I don’t know much about it. But OpenVZ seems to be the front runner in that area. My initial impression was that it made a lot of sense – keep one common kernel, but allow containers/virtual environments (CTs/VEs) to have, essentially, their own userland. Unfortunately, it doesn’t seem to be as hyped as Xen, and I haven’t heard very much about it in the enterprise context. And it requires running a kernel from the OpenVZ project, which means I can’t just script updates through yum as easily as normal.

On the up size, OpenVZ would allow me to eliminate the duplication of the kernel, and seems to have much less overhead than Xen (and logically so). On the down side, I lose the ability to virtualize other OSes, kernel versions, or make pre-packaged VMs. I’ve decided that if I wanted to do that, I could dedicate a single machine.

I’ve spent the last day or so doing a lot of research, and have come up with the following questions and concerns about OpenVZ which I hope to be able to answer (I’ll post the answers in a follow-up).

  • How do I handle distribution and kernel upgrades? The logical solution would be to migrate the CT to another host while I upgrade CT0 (the hardware OS/host/dom0 in Xen speak). But if the guest and host kernels must match, how does this work?
  • Can I do package upgrades within the guest/CT easily? WIll this play well with Puppet?
  • How will I handle backups? Is it logical to run bacula within each CT, or just on CT0? If just on CT0, how do I easily verify that a particular CT was backed up?
  • WIll everything play well with Puppet? (see below)
  • Am I willing to throw away my KickStart-based installs? And, similarly, am I willing to give up the possibility of migrating from a container to a Xen host or a physical host (easily)?
  • OpenVZ live migration relies on rsync. This means that there’s a significant delay (compared to shared storage) and also that I can’t migrate off of a host that’s down. Is there a way around this?
  • Similarly, live migration requires root SSH key exchange (passwordless) between the hosts. This seems about equivalent to using hosts.equiv. Do I really want root on one box to mean root on another box (and all of the containers on that box)?
  • Can I still firewall CT0? How will this work?

It seems to me that OpenVZ may be significantly less enterprise-class than Xen. Sure, this is just my home setup, but I hold it to the same standards I use for my work systems. In fact, I usually test new technologies at home before I suggest them at work. A lot of the writing on the OpenVZ wiki seems to be riddled with spelling errors. They claim “zero downtime” live migration, but if they have to rsync 2Gb of MySQL tables, that sounds like a lot more than “zero”. And, most shockingly, the Hardware testing wiki page talks about making sure your hosts aren’t overclocked or undercooled, and running cpuburn to test your system under high load. Sorry, but the engineers at HP, Sun, IBM, etc. handle that for me and most people I know. So, I’m a bit worried about the seriousness of the OpenVZ project.

Most worrisome is a post I found in the OpenVZ forum, “Stopping puppet on hn stops it in all VE”. It seems that, since CT0 is aware of all of the guest container processes, they show up in ps lists. Most, if not all RedHat init scripts use killproc to stop and restart services. This means that a service syslog stop on the CT0 (host) will stop all syslog processes, including all of them in the CTs. This seems like a major issue. Sure, I could replace killproc on CT0 with a script that parses the process list, isolates the PIDs for those running on CT0, and kills them. But what else needs to be fixed? Nagios check scripts would need to be adjusted. Is there anything else that would come back and bite me?

The bottom line is that (I guess this is logical) it seems that containers in OpenVZ will seem – and act – a lot less like a logical host than they would under Xen.

New web server, WP optimization

Tonight, more or less on a whim, I moved my blog from my older (dual 1GHz Pentium III Coppermine, 1GB RAM, 10k RPM SCSI disks, Compaq Proliant DL360 G1, OpenSuSE 10.2 32-bit) web server to my newer one (dual 1.4GHz Pentium III, 2GB RAM, 10k RPM SCSI disks, HP Proliant DL360 G2, CentOS 5.3 32-bit). I did some profiling with ab (ApacheBench), and just moving from one server to the other got some serious performance gains (I was profiling with runs of 1000 requests total, 10 concurrent requests). I also added the W3 Total Cache WordPress plugin, which got the numbers to look even better!

As a side note, this was all done pretty quickly (moving the database and tarball for the vhost, installing the plugin, changing DNS), so please give me a heads-up if you experience any problems.

The numbers are rather impressive:

 Total Time(s)RPSAvg. Connection Time (ms)
Old Server1192.252838.7511,893
New Server569.1211757.095,667
Default W3tc Config23.75442,098.44237
Tuned W3tc12.28181,428.76122

All tests were performed on my workstation, a Dell Precision 470, two dual-core Xeons at 2.8 GHz, 2GB RAM, 16GB swap, OpenSuSE 11.1 64-bit. This was on the same LAN and subnet as the servers, with the workstation connected via a 1Gbps copper Ethernet link and the web-serving interfaces of the servers connected via 100Mbps (There’s a trunk in between, from the gigabit aggregation switch to the 100Mbps distribution switch).

Project Announcement – PHPsa

So, here’s the “official” scoop on the new project that I’m planning/starting to work on. I’m calling it PHPsa for now, and it’s going to (hopefully) be an integrated dashboard/portal for SysAdmins. While there are a number of tools that fit into this general category (perhaps with OSSIM being the closest, though it’s security-minded), I feel that there’s a real gap in terms of tool integration. My daily workflow, which includes multiple trips to and correlation among Nagios, Cacti, DNS, DHCP, Puppet, logs, and other tools really leaves something to be desired. So, I’m setting out to create a modular SysAdmin dashboard that unifies many of the common SysAdmin-related tools into a modular dashboard.

The first overall design goals that I’ve set are:

  1. A modular, plugin-based architecture that allows admins to select which features/tools they want, and allows easy development of new modules.
  2. Design with legacy tools in mind – easy ways to tie in to tools that weren’t written with PHPsa in mind, both in terms of linking to information and gathering/unifying information.
  3. RBAC, including per-module rules and the possibility for a limited read-only view (client/user mode).
  4. Use of data sources, specifically web-based/REST APIs where available, and databases otherwise, from existing tools with as little modification as possible.
  5. Support for database abstraction, though I’ll be using MySQL.
  6. Eventually, implement RSS feeds of pertinent information.
  7. Balance Ajax/DHTML with the desire for important things to have canonical, static, bookmark-able URLs.

So, here are some of the things that I’m planning on integrating, with obvious bias towards getting my own projects done before I integrate pre-existing tools:

  • MultiBindAdmin, my DNS and DHCP administration tool (specifically geared towards split-view DNS with the inside view behind NAT).
  • RackMan, my tool for mapping devices’ physical locations in racks (and tacking patching).
  • My simple config tool for Puppet.
  • Nagios.
  • Cacti.
  • Nathan Hubbard’s MachDB.
  • Bacula (monitoring/status only).
  • Syslog via rsyslog (or any other syslog-to-SQL solution).
  • Possibly a front-end to Google Analytics.
  • Some of my custom scripts for graphing SpamAssassin, DNS queries, etc.
  • Some sort of Apache log analysis, like Webalizer.
  • Mail log analysis, possibly AWstats.

So, the first big issues that I’m going to tackle:

  1. General layout. Specifically, how to handle a more-or-less consistent layout while integrating tools that weren’t designed for PHPsa. I’ll probably end up using iFrames (or even a frameset) for tools that don’t integrate well.
  2. How to correlate data/objects between different tools (i.e. how to display information from Nagios, Cacti, MultiBindAdmin and MachDB for a given host?).
  3. Do I want to use a templating engine like Smarty or hand-code all of the HTML?
  4. How will I handle plugins?
  5. How much code do I want to re-write and how much can I use as-is from other tools? And, on a related note, how much existing data can I access easily from other tools, vs having to use grabber scripts that dump data in MySQL?

Update 2010-02-03: I think this may become a semi-official project for me at $work, which means that I’ll be able to dedicate quite a bit more time to it. Unfortunately, it also means that I will, most likely, have to give up Nathan Hubbard’s MachDB in favor of OCS Inventory NG, a more mature project that already includes inventory support for Linux, Windows and Mac.

DNS Move

Yesterday I finally began moving DNS for my sites from GoDaddy to my own in-house system of master/slave BIND9. While both DNS servers are currently at the same location and on the same WAN connection (heck, they’re beind the same router, too), so is all of the rest of my infrastructure. Migrating jasonantman.com was definitely the most critical task, this has allowed me to easily use my new project, MultiBIND Admin to manage DNS. In addition to just being simpler than using GoDaddy’s tool, it allows me to manage DNS for both the external view and the NATed internal view in one tool. I did have a brief mail outage thanks to some incorrect MX records being served by the slave, and a few other issues with the caching DNS servers at work not expiring the old records, but all seems to be well now. It was a relatively smooth transition, though I haven’t yet moved over some of my older less used domains.

The next part of my project, when I move the ambulance corps hosted services in-house, will be trying to find a decently-priced DNS hosting company that will just act as a slave, to keep DNS up if my WAN connection goes down.

September 2009 Project Updates

I know I haven’t been posting a lot, but here’s an update on some of my projects:

  • PHP EMS Tools – I’ve done quite a bit of work for the ambulance corps, and intend on rolling this into the main distribution. I’ve also added an Asterisk/AGI module to handle crew call-ins. It’s going to be a long road, as I have to manually diff the ambulance corps version to the trunk version and merge the changes (leaving out anything specific to our organization), but I plan on doing it. The next version will also include historical tracking of roster information (member information, status, positions, committees, etc.) and LDAP integration for authentication.
  • PHPsa – My new project, tentatively called PHPsa, is an integrated dashboard for sysadmins. The idea is to develop a plugin-based portal for SA tools. Currently, I will be including some of my own projects – MultiBindAdmin (a tool to administer BIND and DHCPd, specifically geared towards split-view DNS with the inside behind NAT) and RackMan (a tool to track and visualize the location of devices within racks, including ability to temporarily move devices around) – as well as my updates to Nathan Hubbard’s MachDB.

I’ve also done quite a bit of customization of the current version of Nathan Hubbard’s MachDB. My local version is in subversion. It adds detailed network interface information, information on expansion slots, and some extra details for the system and storage. I plan on developing a patch and contacting Nathan once I get a chance. It also includes a Python collector script that I developed.

New Projects

In terms of ongoing projects, I should be updating RackMan sometime soon, and also adding the demo site.

I’ve begun to move DNS for all of my domains in-house, mostly because since everything is behind NAT, it’s a real pain to manage DNS entries in two places (one of them being GoDaddy’s web interface). Because of the NAT issue, I’m also writing my own BIND configuration tool, currently named MultiBIND Admin. In addition to managing multiple zones in a sane way, it stores all configuration in MySQL. Among other things, it can store different IP addresses for A records for the inside and outside views. Zone files can either be pulled by a script on the name server (push capability is being worked on) or downloaded (for uploading to a DNS hosting provider like GoDaddy).

For my final project for my XML web design class, I’m going to be making some “mashup” with RackMan, Google Maps, Google Visualizer, Nagios, and a few other tools…

Stay tuned…

Blinkenlights (blinkenlichten)

I’ll be posting more on this in the next few days, but I did a few more upgrades at home, including a Proliant DL380G2 to replace my aged ML370 (G1) storage box (array is failing badly) and a Proliant DL360 G2 as a second web server (and possibly moving Nagios over to that box).

I’m running into some problems with the old management card for the Tripp Lite UPS, and I have a few other issues to sort out, but here’s a photo that I took this weekend after the upgrades (yes, it’s a bit blurry – that happens handheld at 1/10 sec).

blinkenlights

ACHTUNG! ALLES LOOKENSPEEPERS!
Alles touristen und non-technischen looken peepers! Das computermachine ist nicht fuer gefingerpoken und mittengrabben.
Ist easy schnappen der springenwerk, blowenfusen und poppencorken mit spitzensparken. Ist nicht fuer gewerken bei das dumpkopfen. Das rubbernecken sichtseeren keepen das cotten-pickenen hans in das pockets muss; relaxen und watchen das blinkenlichten.

(For those of you who aren’t familiar with it, blinkenlights).