Archive

Posts Tagged ‘Nagios’

Parsing Nagios status.dat in PHP

February 21st, 2010

If you’re just looking for the script or PHP module, you can get them via Subversion at: http://svn.jasonantman.com/nagios-xml/.

A while ago (back in late 2008), I wrote a PHP script that parses the Nagios status.dat file into an associative array. My original use was to output XML which was then read by another script on another server and used for a small custom GUI. It’s a very simple PHP script that just takes the path of the status.dat file (which, obviously, must be readable by the user running the script).

At that time, I was using Nagios v2. Since then, I’ve moved to Nagios v3, and have updated the script to include the ability to parse v3 status.dat files, as well as a function to detect the version of a status file. I also refactored the code so that the parsing functions are all contained in a single file (statusXML.php.inc) which is safe to include in other scripts. The actual statusXML.php file now just includes examples of how to call all of the functions and output XML (though it is equally useful to output the serialized array, or use it directly).

Since I posted my script online, two people have been kind enough to send back their modifications:

Both of these generous contributions have been included in my Subversion repository as of the current revision, 5. Unfortunately, due to my delay in putting my Nagios3 code into svn, both of these contributions are Nagios v2 only.

As time permits, I plan on merging Artur’s changes into the current version of statusXML.php.inc. Unfortunately, C isn’t one of my strong points, but I plan on also updating Whitham’s PHP module code to work with Nagios3 as soon as possible.

Stay tuned for updates, and thanks to both gentlemen for contributing their work. I’m always interested in hearing how people are using my code, and how they are making it better.

Also: While I added this project to Nagios Exchange, and plan on adding it to Monitoring Exchange, I don’t always keep those sites up to date (I can’t access Nagios Exchange right now, and who knows if I’ll have time to update it tomorrow). I strongly recommend directly checking out from Subversion at http://svn.jasonantman.com/nagios-xml/ or taking a look at the code through ViewVC at http://viewvc.jasonantman.com/cgi-bin/viewvc.cgi/nagios-xml/.

Projects , ,

pnp4nagios, CentOS 5.3 and pcre

February 11th, 2010

I started testing out the pnp4nagios tool to incorporate graphs of performance data into Nagios. Despite what Klein and Sellens suggest (p. 57), I really don’t want separate tools for monitoring and trending. Cactialready handles UPS metrics, switch ports, router traffic, etc. For everything else – system load, etc. – I see no reason to have two checks run rather than just one (Nagios).

There was a CentOS package for the older pnp4nagios 0.4.x, but I opted to build and install the new 0.6.x from source. Unfortunately, I hit one snag – it requires PCRE compiled with support for Unicode properties, and I couldn’t find any package for CentOS compiled with that option. So, with a simple edit of the %configure macro in the SPEC file, I built one. Unfortunately, I wasn’t working in a real build environment – just on one of my web servers – so I only built the .i386 version, but you can feel free to build from the source rpm.

Tech HowTos , , , , ,

Nagios and check plugins run as root

November 5th, 2009

No matter how much we may not like it, and no matter how insecure it can potentially be, we occasionally have to run Nagios check scripts (written in scripting languages) as root. (On a side note, this method is also used for my MultiBindAdmin project’s DNS file push). Here’s how to do it:

  1. Write your check script in the language of your choice and test as root.
  2. Grab setuid-prog.c.
  3. uncomment the DEFINE for FULL_PATH, change the string to the full path to your script.
  4. Be sure your script is owned by root, and is chmod at most 755.
  5. Compile setuid-prog.c:
    gcc -o {check_script_name}-wrapper setuid-prog.c
  6. Put the resulting binary in your plugin directory.
  7. Assuming your checks run as user nagios and group nagios, chown the binary to root:nagios and chmod 4755.

This allows the use of the SUID bit with scripts.

Use at your own risk. I only recommend this on systems where the Nagios account is strongly authenticated, and where ALL users are trusted.

Tech HowTos

Nagios check_by_ssh and NAT

October 22nd, 2009

At a remote location, I have a number of machines to monitor but only one IP (dynamic on a residential connection). Most of my remote monitoring with Nagios uses check_by_ssh. Previously, I’d used one host for Nagios to SSH to, and then chained together another check_by_ssh to reach the remote hosts. Unfortunately, this means nothing past the one first host can get monitored if the first host is down. All of the other hosts (everything is behind NAT) have SSH visible externally on different ports.

SSH itself doesn’t like one IP/hostname with SSH on different ports – host key verification will fail, as the SSH client only looks at the address that it’s connecting to, not the port number. Normally, this is bypassed by using a .ssh/config file like:

Host foo1
        Hostname foo.example.com
        HostKeyAlias foo1
        CheckHostIP no
        Port 22
        User nagios
 
Host foo2
        Hostname foo.example.com
        HostKeyAlias foo2
        CheckHostIP no
        Port 222
        User nagios
 
Host foo3
        Hostname foo.example.com
        HostKeyAlias foo3
        CheckHostIP no
        Port 10022
        User nagios

And then you SSH using the “Host” named in the config file, not the actual hostname.

Unfortunately, the only way to get check_by_ssh to do this was a bit messy, and required defining a bunch of extra macros for each host:

/check_by_ssh -o Hostname=foo.example.com -o HostKeyAlias=foo1 -o CheckHostIP=no -o Port=222 -o User=nagios -H foo.example.com -C uptime

So, I made a quick little patch for check_by_ssh.c (patched against the released nagios-plugins-1.4.14) :

--- check_by_ssh.c      2009-10-22 14:32:26.000000000 -0400
+++ check_by_ssh_ORIG.c 2009-10-22 14:12:15.000000000 -0400
@@ -181,7 +181,6 @@
                {"skip", optional_argument, 0, 'S'}, /* backwards compatibility */
                {"skip-stdout", optional_argument, 0, 'S'},
                {"skip-stderr", optional_argument, 0, 'E'},
-               {"ssh-config", optional_argument, 0, "F"},
                {"proto1", no_argument, 0, '1'},
                {"proto2", no_argument, 0, '2'},
                {"use-ipv4", no_argument, 0, '4'},
@@ -199,7 +198,7 @@
                        strcpy (argv[c], "-t");
 
        while (1) {
-               c = getopt_long (argc, argv, "Vvh1246fqt:H:O:p:i:u:l:C:S::E::n:s:o:F:", longopts,
+               c = getopt_long (argc, argv, "Vvh1246fqt:H:O:p:i:u:l:C:S::E::n:s:o:", longopts,
                                 &option);
 
                if (c == -1 || c == EOF)
@@ -222,7 +221,7 @@
                                timeout_interval = atoi (optarg);
                        break;
                case 'H':                                                                       /* host */
-                 /* host_or_die(optarg); */     /* commented out 2009-10-22 by jantman for ssh config file use */
+                       host_or_die(optarg);
                        hostname = optarg;
                        break;
                case 'p': /* port number */
@@ -300,12 +299,6 @@
                        else
                                skip_stderr = atoi (optarg);
                        break;
-               /* added 2009-10-22 by jantman for ssh -F option (config file) */
-               case 'F':                                                                       /* ssh config file */
-                       comm_append("-F");
-                       comm_append(optarg);
-                       break;
-               /* END added 2009-10-22 by jantman */
                case 'o':                                                                       /* Extra options for the ssh command */
                        comm_append("-o");
                        comm_append(optarg);
@@ -411,8 +404,6 @@
   printf ("    %s\n", _("Ignore all or (if specified) first n lines on STDERR [optional]"));
   printf (" %s\n", "-f");
   printf ("    %s\n", _("tells ssh to fork rather than create a tty [optional]. This will always return OK if ssh is executed"));
-  printf (" %s\n", "-F");
-  printf ("    %s\n", _("path to ssh config file [optional]"));
   printf (" %s\n","-C, --command='COMMAND STRING'");
   printf ("    %s\n", _("command to execute on the remote machine"));
   printf (" %s\n","-l, --logname=USERNAME");

It works fine. The only problem is that I disabled the check that the given hostname/IP is valid, so instead of getting a nice “Invalid hostname/address – foobar” error, you’ll get the usual “Remote command execution failed: ssh: foobar: Name or service not known” error (though it will still give an exit code of 3). I had to do this because check_by_ssh was checking for a valid hostname itself, though SSH needs to be passed the “Host” alias as defined in the config file.

With the patch, we now have something nice and clean like:

./check_by_ssh -H foo1 -F /home/nagios/.ssh/config -l nagios -i /home/nagios/.ssh/id_dsa -C uptime

Which only adds the “-F” flag to what I was already using, and is safe to use for all hosts.

When I get a chance, I’ll figure out a way to gracefully deal with the host aliases (”fake hostnames”) and submit a patch. Most likely, I’ll add another option so that you have to specify both the actual hostname (so it can check that it exists) and the alias used in the config file (perhaps “-a”?)

Tech HowTos ,

Project Announcement – PHPsa

September 29th, 2009

So, here’s the “official” scoop on the new project that I’m planning/starting to work on. I’m calling it PHPsa for now, and it’s going to (hopefully) be an integrated dashboard/portal for SysAdmins. While there are a number of tools that fit into this general category (perhaps with being the closest, though it’s security-minded), I feel that there’s a real gap in terms of tool integration. My daily workflow, which includes multiple trips to and correlation among Nagios, Cacti, DNS, DHCP, Puppet, logs, and other tools really leaves something to be desired. So, I’m setting out to create a modular SysAdmin dashboard that unifies many of the common SysAdmin-related tools into a modular dashboard.

The first overall design goals that I’ve set are:

  1. A modular, plugin-based architecture that allows admins to select which features/tools they want, and allows easy development of new modules.
  2. Design with legacy tools in mind – easy ways to tie in to tools that weren’t written with PHPsa in mind, both in terms of linking to information and gathering/unifying information.
  3. RBAC, including per-module rules and the possibility for a limited read-only view (client/user mode).
  4. Use of data sources, specifically databases, from existing tools with as little modification as possible.
  5. Support for database abstraction, though I’ll be using MySQL.
  6. Eventually, implement RSS feeds of pertinent information.
  7. Balance Ajax/DHTML with the desire for important things to have canonical, static, bookmark-able URLs.

So, here are some of the things that I’m planning on integrating, with obvious bias towards getting my own projects done before I integrate pre-existing tools:

  • MultiBindAdmin, my DNS and DHCP administration tool (specifically geared towards split-view DNS with the inside view behind NAT).
  • RackMan, my tool for mapping devices’ physical locations in racks (and tacking patching).
  • My simple config tool for Puppet.
  • Nagios.
  • Cacti.
  • Nathan Hubbard’s MachDB.
  • Bacula (monitoring/status only).
  • Syslog via rsyslog (or any other syslog-to-SQL solution).
  • Possibly a front-end to Google Analytics.
  • Some of my custom scripts for graphing SpamAssassin, DNS queries, etc.
  • Some sort of Apache log analysis, like Webalizer.
  • Mail log analysis, possibly AWstats.

So, the first big issues that I’m going to tackle:

  1. General layout. Specifically, how to handle a more-or-less consistent layout while integrating tools that weren’t designed for PHPsa. I’ll probably end up using iFrames (or even a frameset) for tools that don’t integrate well.
  2. How to correlate data/objects between different tools (i.e. how to display information from Nagios, Cacti, MultiBindAdmin and MachDB for a given host?).
  3. Do I want to use a templating engine like Smarty or hand-code all of the HTML?
  4. How will I handle plugins?
  5. How much code do I want to re-write and how much can I use as-is from other tools? And, on a related note, how much existing data can I access easily from other tools, vs having to use grabber scripts that dump data in MySQL?

Update 2010-02-03: I think this may become a semi-official project for me at $work, which means that I’ll be able to dedicate quite a bit more time to it. Unfortunately, it also means that I will, most likely, have to give up Nathan Hubbard’s MachDB in favor of OCS Inventory NG, a more mature project that already includes inventory support for Linux, Windows and Mac.

PHPsa, Projects , , , , , , , , ,

Daily Work – Nagios SNMP traps, Vyatta, JasonAntman.com upgrades

June 27th, 2009

So it’s been a very busy day. I was up until 5 AM or so working on implementing Puppet at home. I’m building two new boxes – a storage (centralized home directory)/syslog (to MySQL) server and a second web server (possibly also to handle Nagios) – and I decided that they’ll be totally built by Puppet. The only thing I had to give up on was setting up the NFS share for my home directory on the new storage box and installing and testing rsyslog on it.

This afternoon around 7, I started on my weekend projects for the ambulance corps – setting up Nagios to receive SNMP traps from the APC UPS and moving over to the new Vyatta-based router (from m0n0wall). I’d attempted the router before, but had to rollback – I’m using an old BlueSocket controller for hardware – it’s just a nice black 1U enclosure with a stock Intel motherboard, 20GB HDD, 512MB RAM and three 10/100 NICs. The first time, I was unable to get link on either of the two NICs I was using, so I decided to rollback.

Nagios SNMP Traps

I found a good starting point for Nagios SNMP traps on the OpsView blog. I setup `snmptrapd` on the Nagios server and hacked together a little Python script to just write all of the traps to a file. After some testing with `snmptrap` on my laptop, I did a test by pulling the power plug of the UPS, waiting about 30 seconds, and then plugging it back in. Sure enough, the little old AP9605 PowerNet SNMP card generated two SNMP traps – one for power loss and one for power regained – both of which showed up in the test file

The next step will be deciding how to get the traps into Nagios – specifically whether I want to go with something heavy-weight, like SNMPtt that can handle other devices, or whether I want to code a simple script myself just to deal with the APC cards.

Router

The main reason why I wanted to make the switch from m0n0 to Vyatta was to ease the setup and maintenance of an IPsec tunnel from the ambulance HQ to my house, so I could push backups (relatively small) over the WAN to my infrastructure (or, rather, have Bacula pull the backups). Another big bonus was finally having a way of configuring and checking things through SSH without having to port-forward a web GUI. Another bonus of having a real Linux system under the router is the ability to make custom Nagios check scripts and easily execute them. Something I hadn’t thought of – but became obvious during the switchover – is the ability to run full-fledged `tcpdump` on the router itself.

After building the new config myself, and confirming that the system ran in isolation, I moved it over to production. The first issue was a bit of a thinko on my part – the interfaces on the BSC are actually arranged on the back of the box like eth0—–eth2—–eth1, so I originally had the LAN uplink in the wrong interface. After correcting that and waiting for the network to stabilize, I noticed a total external connectivity failure. After some troubleshooting – thanks to tcpdump on the router – it occurred to me that the (ancient) cable modem needs to be rebooted when the router MAC changes.

I honestly don’t remember the other problems that I ran into, but eventually I ended up getting almost-full functionality – and then a total network outage. A tcpdump on my laptop showed some really really weird BOOTP traffic with addresses of 255.255.255.255. After doing some troubleshooting and monitoring port counters on the switch, I narrowed it down to coming from a single Windows box and the wireless access point. After shutting off both ports, things seemed to stabilize. I also had some “martian address” issues with one of the boxes, but decided to roll the box and that solved it.

Over the next day or so, I’ll be reconfiguring Nagios both at home and at the ambulance corps to cope with the changes and add in the requisite monitoring, and keep an eye on things. Assuming all goes well, I’ll power down the old router on Sunday.

On the home front, I’ve moved over from my old storage machine to the old one – essentially just the NFS mount, and moved over a tarball of everything else. I also added a 1000Base-SX card to the new box, though it appears that I’m out of fiber patch cords. The old storage box was brought down for the first time in about 3 years (aside from brief outages for hardware upgrades or array rebuilds). Assuming I got everything off of it, it will be relegated to the spares pile.

I’m going to make a serious effort to post on a daily basis, if only for my own future reference. I should have the demo of RackMan out soon, and I’m also about to start on integrating it with Nathan Hubbard’s MachDB as well as a PHP script I wrote to pull port names and MACs from Cisco switches and associate them with NICs in machines. Hopefully I’ll also have some interesting Puppet stuff out soon.

Miscellaneous Geek Stuff , , , ,

Massive Updates

July 17th, 2008

I know I’ve been quite for a while. I’ve been quite busy. Unfortunately, due to changing priorities, there are a lot of projects I’ve been working on, but few of them have gotten finished. A sampling, in no specific order:

  • Migrating my network/service monitoring to Nagios 3, totally re-writing my config files to make use of the new features, and making one coherent list of all the services that should be in it and aren’t.
  • Planning to totally re-wire all networking at the ambulance corps building to eliminate some problems. This includes building an 8U wall-mount rack, and also trying a PC Engines ALIX.2c1 board as a router (still undecided on WAN/LAN/DMZ or WAN/LAN/WLAN). It also means a long day of work at some point in the future, and lots of cable drops.
  • tuxOstat, the linux-controlled thermostat, is pretty much on the back burner. It’s a stable beta with severely reduced functionality, but has been handling my cooling needs without any major bugs in the past month or so. It still only has a basic CLI interface and a very simple kludge of a web GUI, but it works. Other modes (heating, fan only), predictive temperature calculation, other temperature/zone calculation modes, and physical controls (buttons, menu on LCD) are still to come, as well as the move from PC to Soekris (if I can ever figure it out, and get one with USB). I now feel that an ALIX board might be a better shot, as they take CF (more space than the Soekris), have a slightly faster processor, and also support USB at about half the price point.
  • I’m considering moving my main web site to a CMS, and letting the wiki serve more as a knowledge base.
  • I’m working on patching together a new access point for the ambulance corps, based on Pyramid Linux. I needed something which would run on the Soekris net4526, had at least WEP, and supported some sort of captive portal. Pyramid has WifiDog, but that only wants to do local authentication or RADIUS, and I wanted direct auth to LDAP and MySQL logging. On the positive side, it just uses some PHP pages hosted under Apache to handle authentication – the WAP redirects the user to a login page on a (separate) web server, the user does their stuff, and then the WAP makes a request to the server to determine whether it should open up the firewall, keep the user locked down, or totally kick them. So, once I figure out some routing issues, I’ll get back to working on the new project – BlackLabAuth, a re-write of the WifiDog auth server software that’s geared towards a closed-access network (i.e. only people and/or MACs already listed in LDAP can login) with full logging to MySQL. I already have some code in CVS, but some issues with my development Soekris board have slowed the project for the time being. When finished, I’ll have not only the new auth server available for download (with documentation) but also a ready-to-run (well, some configuration time needed, but minor and scripted) image for the net4526.
  • My desktop that I use for MythTV filled up its’ disk. Totally. I ordered a cheap Syba SATA card (PCI) from NewEgg, along with a 500GB WD SATA-150 disk, but no luck. Though the card (Syba / Initio 1622 chipset, shows up as Class 0106: 1101:1622 (rev 02)) said it was supported under Linux, the driver CD mentioned nothing about it. Some investigation on the Syba website turned up a zipped archive. After extraction, I found a readme that gave (poor) instructions on how to re-compile a kernel, and warned that you MUST have 2.6.15. Oh well, I wasn’t going to give up 2.6.16.27 (the newest RPM’d kernel for OpenSuSE 10.1). The standard drivers for it didn’t appear until 2.6.25 or so. So… after many debates with myself as to whether I should blow away my whole MythTV installation and upgrade from the now-ancient 10.1, I decided that I’ll only be in my apartment for another year, I should make it last. Some investigation turned up a $24 Silicon Image-based card that should work fine, and it’s now in the mail…

I’m sure I missed something big, but I’ll update as needed, and attempt to make it a daily habit to post something interesting or, at the very least, hard-to-find. After all, I’m sure that I use this blog and my wiki as an informational resource (my bad memory) more than anyone else would…

Projects , , , , , , , , ,

Web Traffic for JasonAntman.com – Webalizer, Site Maps

April 22nd, 2008

I’ve been working on the design of a town council campaign site for a friend – www.mikennick08.com. It’s hosted by an additional Apache vhost on my personal server (and running off of port 10015 – ugh). I setup Webalizer for him, so I figured I’d give my own webalizer installation a check. Wow – 30,215 hits this month alone. That reminded me of a problem with my ignored hosts – 17,600 of those hits were Googlebot, and another 2,065 were Yandex (a Russian search engine).

Amazingly, though, it seems like Google is only indexing my blog. My precious wiki seems out of whack, not to metion my CVS repository.

So, this reminded me of two long-overdue tasks:

  1. Get webalizer to properly ignore the common bots.
  2. Get sitemaps of my entire site.

So, off to the races!

First, I added Googlebot, Yandex, and a few others to webalizer.conf with IgnoreAgent directives. Then, after clearing out my entire output directory – and waiting a LONG time for it to run – bingo! Real stats. Down to about 8000 hits for the month, which seems more logical, even including the ~2,000 hits from Google feedfetcher.

Next stop was sitemaps. It tooks some PHP magic to hack apart the MediaWiki sitemaps, put in the correct URLs (it was showing an internal-only hostname), and drop all that and my Blogger rss.xml in an index file. It’s now 2 AM, and it just crashed and burned – the PHP script worked fine, but for some reason my entries in sitemaps_index.xml – which pointed to sitemaps in a subdirectory – came back with errors. Well, something to work on tomorrow.

This morning I checked my backups and noticed that nothing had run in 3 days. It turns out I just had one failed job holding everything up. And I screwed up – I was home this weekend and forgot to swap tapes. It’ll be another 2 weeks before I can. But, I took the time to setup a backup status box on my administrative portal (more on that later) and will also be revising my apparently ineffectual Nagios check script.

On a few side notes: First, I’m seriously thinking of dumping Verizon FiOS. While I really like the service, their static IP (business) variant is $100/month for 15 Mbps down / 2 Mbps up, whereas Cablevision’s Optimum Business with static IP is $55/month for 30 Mbps down / 5 Mbps up!

Most of the previous projects have been put on hold for the time being (mainly because of impending final exams at school) – the new Gigabit Ethernet switch for backups, testing Zenoss and upgrading monitoring (to a new product or Nagios 3), etc.

Projects , , , , ,