Daily Work - Nagios SNMP traps, Vyatta, JasonAntman.com upgrades

June 27th, 2009

So it’s been a very busy day. I was up until 5 AM or so working on implementing Puppet at home. I’m building two new boxes - a storage (centralized home directory)/syslog (to MySQL) server and a second web server (possibly also to handle Nagios) - and I decided that they’ll be totally built by Puppet. The only thing I had to give up on was setting up the NFS share for my home directory on the new storage box and installing and testing rsyslog on it.

This afternoon around 7, I started on my weekend projects for the ambulance corps - setting up Nagios to receive SNMP traps from the APC UPS and moving over to the new Vyatta-based router (from m0n0wall). I’d attempted the router before, but had to rollback - I’m using an old BlueSocket controller for hardware - it’s just a nice black 1U enclosure with a stock Intel motherboard, 20GB HDD, 512MB RAM and three 10/100 NICs. The first time, I was unable to get link on either of the two NICs I was using, so I decided to rollback.

Nagios SNMP Traps

I found a good starting point for Nagios SNMP traps on the OpsView blog. I setup `snmptrapd` on the Nagios server and hacked together a little Python script to just write all of the traps to a file. After some testing with `snmptrap` on my laptop, I did a test by pulling the power plug of the UPS, waiting about 30 seconds, and then plugging it back in. Sure enough, the little old AP9605 PowerNet SNMP card generated two SNMP traps - one for power loss and one for power regained - both of which showed up in the test file

The next step will be deciding how to get the traps into Nagios - specifically whether I want to go with something heavy-weight, like SNMPtt that can handle other devices, or whether I want to code a simple script myself just to deal with the APC cards.

Router

The main reason why I wanted to make the switch from m0n0 to Vyatta was to ease the setup and maintenance of an IPsec tunnel from the ambulance HQ to my house, so I could push backups (relatively small) over the WAN to my infrastructure (or, rather, have Bacula pull the backups). Another big bonus was finally having a way of configuring and checking things through SSH without having to port-forward a web GUI. Another bonus of having a real Linux system under the router is the ability to make custom Nagios check scripts and easily execute them. Something I hadn’t thought of - but became obvious during the switchover - is the ability to run full-fledged `tcpdump` on the router itself.

After building the new config myself, and confirming that the system ran in isolation, I moved it over to production. The first issue was a bit of a thinko on my part - the interfaces on the BSC are actually arranged on the back of the box like eth0—–eth2—–eth1, so I originally had the LAN uplink in the wrong interface. After correcting that and waiting for the network to stabilize, I noticed a total external connectivity failure. After some troubleshooting - thanks to tcpdump on the router - it occurred to me that the (ancient) cable modem needs to be rebooted when the router MAC changes.

I honestly don’t remember the other problems that I ran into, but eventually I ended up getting almost-full functionality - and then a total network outage. A tcpdump on my laptop showed some really really weird BOOTP traffic with addresses of 255.255.255.255. After doing some troubleshooting and monitoring port counters on the switch, I narrowed it down to coming from a single Windows box and the wireless access point. After shutting off both ports, things seemed to stabilize. I also had some “martian address” issues with one of the boxes, but decided to roll the box and that solved it.

Over the next day or so, I’ll be reconfiguring Nagios both at home and at the ambulance corps to cope with the changes and add in the requisite monitoring, and keep an eye on things. Assuming all goes well, I’ll power down the old router on Sunday.

On the home front, I’ve moved over from my old storage machine to the old one - essentially just the NFS mount, and moved over a tarball of everything else. I also added a 1000Base-SX card to the new box, though it appears that I’m out of fiber patch cords. The old storage box was brought down for the first time in about 3 years (aside from brief outages for hardware upgrades or array rebuilds). Assuming I got everything off of it, it will be relegated to the spares pile.

I’m going to make a serious effort to post on a daily basis, if only for my own future reference. I should have the demo of RackMan out soon, and I’m also about to start on integrating it with Nathan Hubbard’s MachDB as well as a PHP script I wrote to pull port names and MACs from Cisco switches and associate them with NICs in machines. Hopefully I’ll also have some interesting Puppet stuff out soon.

Miscellaneous Geek Stuff , , , ,

Netgear ReadyNAS 1100 and long UID numbers

June 23rd, 2009

I spent part of the day installing a Netgear ReadyNAS 1100 for backup storage. It’s a cute little 1U storage appliance with 4 SATA disk bays, two Gig copper ports and about every sort of hokey service you could possibly want in something that’s billed as a small business storage server but running a sort-of-scaled-up home appliance OS. That being said, it comes at a wonderful price (we bought one empty and added 4x 1.5TB disks) and runs Linux.

I had a few minor issues with the installation (more along the lines of trying to do things that weren’t clearly documented rather than problems with the unit). NetGear tech support (”ProSupport Labs”) was quite good once I got past the first level or two, and Mark H. who helped me with most of my issues was one of the best tech support people I’ve ever spoken to. In fact, he probably ties with Paulo from HP for the best tech support person I’ve ever dealt with.

Anyway, the only issue with the NetGear installation that we weren’t able to resolve was the fact that the web configuration tool (”Frontview”) will only accept UIDs of a maximum of five characters. Here at Rutgers, we have a unified UID space, with numbers well in excess of 100,000. As a result, my plan of having NFS play well wasn’t really going to work. Mark wasn’t able to come up with a solution (obviously - it’s something that, at best, can get fixed in the next version of the firmware) but he took copious notes, had me confirm them, and told me he’d bring it up to the ReadyNAS engineering team when they meet Thursday, and will try and e-mail me back to follow up. He also spent quite some time on the phone with me, both of us researching what kernel the ReadyNAS 4.1.5 firmware runs (2.6.17.8, mostly vanilla Debian as per mark) and when long UID support was available (at least 2.4). Overall, I was very impressed that Mark knew quite a bit about Linux - even more than your average Linux desktop user - and quite a bit about the internals of the ReadyNAS.

Just on a hunch, after my tech support call, I used the Frontview “Config Backup” tool’s “Users and Groups” option to backup the user and group information. Sure enough, it was just a force-download of a Zip archive… of the pertinent files. I was able to hand-hack /etc/passwd and re-upload it, and it seemed to work fine.

The following is NOT endorsed by Netgear in any way, shape or form. It’s a hack that may or may not work. Do this at your own risk - I have no idea if some future (or past) firmware change might make bad things happen, or whether there are some features which don’t jive with this. All I’ve tested is HTTP logins, the fact that the web interface shows the >2^16 UID correctly, and NFS.

Procedure:

  1. Create desired user in Frontview tool, leaving UID field blank. (In my case, the user is assigned the next sequential UID, 1002).
  2. Frontview “System” -> “Config Backup”, “Backup” tab, select only “Users and Groups”.
  3. Download config files and unzip.
  4. Open etc/passwd from the config archive in a text editor, change the automatically assigned UID (1002) to the desired UID (101739).
  5. Re-zip the directory tree and re-upload to ReadyNAS.
  6. Enjoy.

Tech HowTos , , ,

Blinkenlights (blinkenlichten)

June 23rd, 2009

I’ll be posting more on this in the next few days, but I did a few more upgrades at home, including a Proliant DL380G2 to replace my aged ML370 (G1) storage box (array is failing badly) and a Proliant DL360 G2 as a second web server (and possibly moving Nagios over to that box).

I’m running into some problems with the old management card for the Tripp Lite UPS, and I have a few other issues to sort out, but here’s a photo that I took this weekend after the upgrades (yes, it’s a bit blurry - that happens handheld at 1/10 sec).

blinkenlights

ACHTUNG! ALLES LOOKENSPEEPERS!
Alles touristen und non-technischen looken peepers! Das computermachine ist nicht fuer gefingerpoken und mittengrabben.
Ist easy schnappen der springenwerk, blowenfusen und poppencorken mit spitzensparken. Ist nicht fuer gewerken bei das dumpkopfen. Das rubbernecken sichtseeren keepen das cotten-pickenen hans in das pockets muss; relaxen und watchen das blinkenlichten.

(For those of you who aren’t familiar with it, blinkenlights).

Projects

Please Don’t resize my browser

June 22nd, 2009

It always amazes me to see how much “old school” web design practice is still out there. I’m talking about commercial sites (not MySpace pages) that blatantly ignore web standards about both content and user experience. This isn’t just a Linux thing, though some aspect of it certainly is. The web site of my home town, mpnj.com uses a Flash-based navigation menu that even the official, proprietary Flash player for Linux won’t support - the transparency renders as white, obscuring the text beneath the fully extended size of the menu. I emailed the developer about this on the launch day, and was told in no uncertain terms that - despite the fact that he had a fully-functional alternate version - Linux wasn’t important enough to fix the site. Ironically for a town government web page, it also doesn’t incorporate any accessibility features, which seems to be standard for most of these poor designs.

There are still countless large news sites whose Flash-based video players won’t run under Linux, and even CitiBank’s credit card site has a flash ad that plays incorrectly under Linux.

The real pain that I happened to see today was a company who uses coupons.com to allow customers to print out retail coupons. My first surprise was that to print the coupons, you have to download Windows or Mac software. I’m not quite sure how many people will do this, but it’s probably how viruses spread so quickly (people who will download anything that claims to get them half a dollar off of a roll of toilet paper, or whatever the coupons are for). So, that’s not cool - most coupons I’ve gotten were just HTML emails or PDFs. If their thinking is to control the distribution (they make some comment about a “paper-based printer, not a fax or PDF creator”), they’ve obviously forgotten about photocopy machines and scanners, let alone capturing the spool file on Mac.

More striking, however, was the shock of opening their help page. My primary monitor is a 24″ widescreen, and I generally keep a browser window occupying half the screen width and a terminal next to it. Once I opened their “help” site, it promptly resized my browser window to a tiny 640×480!

This problem, unfortunately, isn’t as rare as it should be. There are still sites that force browser size, disable right clicks (I hadn’t seen that since about 2004 until a few weeks ago… obviously someone who’s never used `wget`) or have a page that doesn’t fully work in FireFox on any platform. Even worse, my personal pet peeve (as at the time of writing this I have about 50+ tabs open in Firefox, and it’s only using a small sliver of my 2GB RAM) is sites that don’t play well with tabbed browsing - either using only JavaScript for all navigation links, or opening all links (site-wide) in the same tab/window. I don’t know how many web sites have lost my business because of this. Or the one I know of that starts a new shopping cart for every tab opened (so if I open each product I want to buy in a new tab, when I add them all to the cart, it ends up with only one).

I don’t know how there can be anyone out there who’s still not using valid XHTML with all of the accessibility features for anything new, especially a commercial site. But even more so, how can there still be people designing web sites who disregard the golden rule of web design: Don’t mess with someone’s browser. Leave things like where to open the link and how big to make the browser to the user. If they’re not technically literate, changing what “usually happens” will just confuse them. If they’re well-versed in how to use a web browser, like me, they’ll just get aggravated by having someone else change their workflow (I doubt the guys who designed those sites would like it if I told them they had to design the whole thing in Emacs). If they’re somewhere in the middle (just found Ctrl+click in Firefox), you’ll confuse them. And God forbid they’re blind and using a page reader… good luck with JavaScript or Flash navigation.

Ideas and Rants , ,

Cisco CatOS GBIC Information

June 20th, 2009

I have a Cisco WS-G4912 (12-port Gigabit aggregation switch) that I’m using to bring my network up to Gig-E. It’s about all that I could afford, and works fine. Most of my older servers are running 1000BASE-SX multimode fiber, but I decided to use copper GBICs for the new boxes that have onboard Gig-E ports. Unfortunately, $100+ for Cisco GBICs was way too much for me, so I found some third-party GBICs on Ebay from TNet USA right in Fairfield, NJ.

I wanted to make sure the GBICs work right, so I happened to find out about the undocumented CatOS command `show sprom [mod/port]` which shows the serial PROM information.

Uncategorized ,

HP Prolaint iLO SSH Problems

June 11th, 2009

There’s a known issue with the SSH implementation in the iLO firmware for HP Proliant servers (specifically G2 and G3) and OpenSSH 5.1p1. There was a thread on the OpenSSH developers list that referenced this problem and suggested a solution, but it doesn’t seem to be a sure fix.

This problem is present on my DL360 G2’s which are running the 1.84 2006-05-05 version of the iLO firmware (iLO 1.84 pass9) with the P26 2004.05.01 version of the system firmware. I also see the issue on a DL380G3 running iLO 1.92 2008.04.24 and system firmware P29 2004.09.15. The only way that I can reliably get into the iLO is by SSHing from a box with an older version of SSH, such as 4.2p1.

Most of the things that I could find online referenced unsetting the LANG environment variable:

unset LANG

and then SSHing with agent forwarding disabled:

ssh -a hostname-ilo

Unfortunately this combination doesn’t seem to do it for me.

I happened to stumble by this post to the debian-ssh mailing list, which suggested that shortening the new OpenSSH version string fixed the problem.

I was able to confirm that the version string is, in fact, the sole problem. I downloaded the source of OpenSSH 5.2p1 and, with the following small patch to version.h, managed to get SSH working to the iLO perfectly:

--- openssh-patched/version.h   2009-06-12 00:35:48.000000000 -0400
+++ openssh-5.2p1/version.h     2009-02-22 19:09:26.000000000 -0500
@@ -1,6 +1,6 @@
 /* $OpenBSD: version.h,v 1.55 2009/02/23 00:06:15 djm Exp $ */
 
-#define SSH_VERSION    "OpenSSH"
+#define SSH_VERSION    "OpenSSH_5.2"
 
-#define SSH_PORTABLE   ""
+#define SSH_PORTABLE   "p1"
 #define SSH_RELEASE    SSH_VERSION SSH_PORTABLE

I patched version.h, ran `./configure`, `make`, and then copied the compiled ssh binary to /usr/bin/ilossh, so that my original ssh binary would be intact, and the ilossh binary would be left alone by RPM upgrades.

Tech HowTos , , ,

300 hits

May 22nd, 2009

I know I’ve been letting my blog die off a bit lately, mainly due to the giant amount of work I’ve been doing. I plan on updating a lot more over the next few weeks.

I happened to get a comment on one of my posts today, so I decided to take a peek at webalizer and see how things are doing. Apparently, I’m getting 300+ hits/day, including a quite nice array of search queries. Unfortunately, due to the way I redid my logging infrastructure, I don’t have any historical data (and it seems like the webalizer history is still broken, as I’m only seeing data for this month). But cool!

Miscellaneous Geek Stuff

Changing the title of a Konsole window

May 19th, 2009

For those of you who use KDE and Konsole, you can easily change the title of the Konsole window with the command:

dcop $KONSOLE_DCOP_SESSION renameSession 'NewSessionName'

this is pretty handy if, like me, you end up having a bunch of screen sessions running in different Konsole windows.

Tech HowTos ,

Acer X233Hbid Review

May 18th, 2009

I just bought myself a new monitor for my MythTV box, as I’ve moved my beautiful Acer AL2416W 24″er to my new desktop. The chosen monitor, based on price, reviews and features, is the Acer X233Hbid. It’s a 23″ 16:9 (not 16:10) monitor that runs at 1920×1080, provides true 1080p, and has an HDMI input (not that I’d ever use a restricted connection). After a few minutes of having it turned on and running, the picture quality is quite nice, even with quite a bit of glare.

However, I have two major complaints within the first ten minutes of unboxing it:

  1. No real manual, nor an online copy. The monitor comes only with a Quick Start Guide. There’s no printed full manual. More distressingly, it isn’t even listed in their list of monitor models on their Support site. There’s no manual copy online either. There was a CD provided with the user’s manual on it. However, for a company that sells netbooks with no CD drive, this seems like quite a bad decision. But why, you ask, would I need a manual for my monitor?
  2. No VESA mounting instructions One of my main criteria in choosing a monitor was that it allow VESA mounting, as I have my MythTV monitor on a monitor arm (easily adjustable angle so others in the room can see). The Acer X233Hbid has a 100×100mm VESA mounting space on the back. However, in a rare design mistake (unlike my 24″ Acer AL2416W), the monitor stand is two parts - one rectangular column about 4″ long attached to the back of the monitor, and a base with a column which mates with the one on the back of the monitor. Unfortunately, the column part on the back of the monitor came pre-attached, and there was no mention in the manual of VESA mounting or how to remove the column.

Column removal: The part of the monitor base which ships attached to the monitor is a fairly easy removal. Though I was originally worried about breaking something on my beautiful new screen, I found two plastic pieces on either side of the pre-attached part of the base which appeared to be snap-in trim pieces. Prying them off with a screwdriver revealed four screws which hold this piece to the monitor. Not only was removal easy, but the trim pieces snapped back into place for a nice clean look.

Reviews , , ,

OpenSuSE Internet2 Mirror

May 13th, 2009

I just bought a “new” desktop - I was thinking of doing an insane AMD Phenom II x4 940 (quad-core 3.0GHz) box - but I happened to find a used machine from $WORK; a Dell Precision 470 workstation, dual Xeon Nocona 2.8GHz processors, 4GB RAM (takes up to 16GB). So, I need a DVD of my usual desktop distro (OpenSuSE) for x64. Being that I’m at work (Rutgers Unviersity), I figured the quickest thing would be to find an Internet2 mirror, as Rutgers has 400Gbps peering on NJedgeNet.

Unfortunately, the OpenSuSE Mirror List doesn’t mention which sites have I2 peering. Luckily, the first logical one I tried - the Harvard mirror - was showing an I2/MAGIPE route via traceroute.

If anyone else needs an I2 mirror of OpenSuSE, http://mirrors.med.harvard.edu/opensuse/ seems to do it. My desktop was getting a sustained +/- 160 Mbps transfer rate, and I got the entire 4.3GB DVD image in under 2-1/2 minutes.

Uncategorized ,