Archive

Archive for June, 2009

Daily Work – Nagios SNMP traps, Vyatta, JasonAntman.com upgrades

June 27th, 2009

So it’s been a very busy day. I was up until 5 AM or so working on implementing Puppet at home. I’m building two new boxes – a storage (centralized home directory)/syslog (to MySQL) server and a second web server (possibly also to handle Nagios) – and I decided that they’ll be totally built by Puppet. The only thing I had to give up on was setting up the NFS share for my home directory on the new storage box and installing and testing rsyslog on it.

This afternoon around 7, I started on my weekend projects for the ambulance corps – setting up Nagios to receive SNMP traps from the APC UPS and moving over to the new Vyatta-based router (from m0n0wall). I’d attempted the router before, but had to rollback – I’m using an old BlueSocket controller for hardware – it’s just a nice black 1U enclosure with a stock Intel motherboard, 20GB HDD, 512MB RAM and three 10/100 NICs. The first time, I was unable to get link on either of the two NICs I was using, so I decided to rollback.

Nagios SNMP Traps

I found a good starting point for Nagios SNMP traps on the OpsView blog. I setup `snmptrapd` on the Nagios server and hacked together a little Python script to just write all of the traps to a file. After some testing with `snmptrap` on my laptop, I did a test by pulling the power plug of the UPS, waiting about 30 seconds, and then plugging it back in. Sure enough, the little old AP9605 PowerNet SNMP card generated two SNMP traps – one for power loss and one for power regained – both of which showed up in the test file

The next step will be deciding how to get the traps into Nagios – specifically whether I want to go with something heavy-weight, like SNMPtt that can handle other devices, or whether I want to code a simple script myself just to deal with the APC cards.

Router

The main reason why I wanted to make the switch from m0n0 to Vyatta was to ease the setup and maintenance of an IPsec tunnel from the ambulance HQ to my house, so I could push backups (relatively small) over the WAN to my infrastructure (or, rather, have Bacula pull the backups). Another big bonus was finally having a way of configuring and checking things through SSH without having to port-forward a web GUI. Another bonus of having a real Linux system under the router is the ability to make custom Nagios check scripts and easily execute them. Something I hadn’t thought of – but became obvious during the switchover – is the ability to run full-fledged `tcpdump` on the router itself.

After building the new config myself, and confirming that the system ran in isolation, I moved it over to production. The first issue was a bit of a thinko on my part – the interfaces on the BSC are actually arranged on the back of the box like eth0—–eth2—–eth1, so I originally had the LAN uplink in the wrong interface. After correcting that and waiting for the network to stabilize, I noticed a total external connectivity failure. After some troubleshooting – thanks to tcpdump on the router – it occurred to me that the (ancient) cable modem needs to be rebooted when the router MAC changes.

I honestly don’t remember the other problems that I ran into, but eventually I ended up getting almost-full functionality – and then a total network outage. A tcpdump on my laptop showed some really really weird BOOTP traffic with addresses of 255.255.255.255. After doing some troubleshooting and monitoring port counters on the switch, I narrowed it down to coming from a single Windows box and the wireless access point. After shutting off both ports, things seemed to stabilize. I also had some “martian address” issues with one of the boxes, but decided to roll the box and that solved it.

Over the next day or so, I’ll be reconfiguring Nagios both at home and at the ambulance corps to cope with the changes and add in the requisite monitoring, and keep an eye on things. Assuming all goes well, I’ll power down the old router on Sunday.

On the home front, I’ve moved over from my old storage machine to the old one – essentially just the NFS mount, and moved over a tarball of everything else. I also added a 1000Base-SX card to the new box, though it appears that I’m out of fiber patch cords. The old storage box was brought down for the first time in about 3 years (aside from brief outages for hardware upgrades or array rebuilds). Assuming I got everything off of it, it will be relegated to the spares pile.

I’m going to make a serious effort to post on a daily basis, if only for my own future reference. I should have the demo of RackMan out soon, and I’m also about to start on integrating it with Nathan Hubbard’s MachDB as well as a PHP script I wrote to pull port names and MACs from Cisco switches and associate them with NICs in machines. Hopefully I’ll also have some interesting Puppet stuff out soon.

Miscellaneous Geek Stuff , , , ,

Netgear ReadyNAS 1100 and long UID numbers

June 23rd, 2009

I spent part of the day installing a Netgear ReadyNAS 1100 for backup storage. It’s a cute little 1U storage appliance with 4 SATA disk bays, two Gig copper ports and about every sort of hokey service you could possibly want in something that’s billed as a small business storage server but running a sort-of-scaled-up home appliance OS. That being said, it comes at a wonderful price (we bought one empty and added 4x 1.5TB disks) and runs Linux.

I had a few minor issues with the installation (more along the lines of trying to do things that weren’t clearly documented rather than problems with the unit). NetGear tech support (”ProSupport Labs”) was quite good once I got past the first level or two, and Mark H. who helped me with most of my issues was one of the best tech support people I’ve ever spoken to. In fact, he probably ties with Paulo from HP for the best tech support person I’ve ever dealt with.

Anyway, the only issue with the NetGear installation that we weren’t able to resolve was the fact that the web configuration tool (”Frontview”) will only accept UIDs of a maximum of five characters. Here at Rutgers, we have a unified UID space, with numbers well in excess of 100,000. As a result, my plan of having NFS play well wasn’t really going to work. Mark wasn’t able to come up with a solution (obviously – it’s something that, at best, can get fixed in the next version of the firmware) but he took copious notes, had me confirm them, and told me he’d bring it up to the ReadyNAS engineering team when they meet Thursday, and will try and e-mail me back to follow up. He also spent quite some time on the phone with me, both of us researching what kernel the ReadyNAS 4.1.5 firmware runs (2.6.17.8, mostly vanilla Debian as per mark) and when long UID support was available (at least 2.4). Overall, I was very impressed that Mark knew quite a bit about Linux – even more than your average Linux desktop user – and quite a bit about the internals of the ReadyNAS.

Just on a hunch, after my tech support call, I used the Frontview “Config Backup” tool’s “Users and Groups” option to backup the user and group information. Sure enough, it was just a force-download of a Zip archive… of the pertinent files. I was able to hand-hack /etc/passwd and re-upload it, and it seemed to work fine.

The following is NOT endorsed by Netgear in any way, shape or form. It’s a hack that may or may not work. Do this at your own risk – I have no idea if some future (or past) firmware change might make bad things happen, or whether there are some features which don’t jive with this. All I’ve tested is HTTP logins, the fact that the web interface shows the >2^16 UID correctly, and NFS.

Procedure:

  1. Create desired user in Frontview tool, leaving UID field blank. (In my case, the user is assigned the next sequential UID, 1002).
  2. Frontview “System” -> “Config Backup”, “Backup” tab, select only “Users and Groups”.
  3. Download config files and unzip.
  4. Open etc/passwd from the config archive in a text editor, change the automatically assigned UID (1002) to the desired UID (101739).
  5. Re-zip the directory tree and re-upload to ReadyNAS.
  6. Enjoy.

Tech HowTos , , ,

Blinkenlights (blinkenlichten)

June 23rd, 2009

I’ll be posting more on this in the next few days, but I did a few more upgrades at home, including a Proliant DL380G2 to replace my aged ML370 (G1) storage box (array is failing badly) and a Proliant DL360 G2 as a second web server (and possibly moving Nagios over to that box).

I’m running into some problems with the old management card for the Tripp Lite UPS, and I have a few other issues to sort out, but here’s a photo that I took this weekend after the upgrades (yes, it’s a bit blurry – that happens handheld at 1/10 sec).

blinkenlights

ACHTUNG! ALLES LOOKENSPEEPERS!
Alles touristen und non-technischen looken peepers! Das computermachine ist nicht fuer gefingerpoken und mittengrabben.
Ist easy schnappen der springenwerk, blowenfusen und poppencorken mit spitzensparken. Ist nicht fuer gewerken bei das dumpkopfen. Das rubbernecken sichtseeren keepen das cotten-pickenen hans in das pockets muss; relaxen und watchen das blinkenlichten.

(For those of you who aren’t familiar with it, blinkenlights).

Projects

Please Don’t resize my browser

June 22nd, 2009

It always amazes me to see how much “old school” web design practice is still out there. I’m talking about commercial sites (not MySpace pages) that blatantly ignore web standards about both content and user experience. This isn’t just a Linux thing, though some aspect of it certainly is. The web site of my home town, mpnj.com uses a Flash-based navigation menu that even the official, proprietary Flash player for Linux won’t support – the transparency renders as white, obscuring the text beneath the fully extended size of the menu. I emailed the developer about this on the launch day, and was told in no uncertain terms that – despite the fact that he had a fully-functional alternate version – Linux wasn’t important enough to fix the site. Ironically for a town government web page, it also doesn’t incorporate any accessibility features, which seems to be standard for most of these poor designs.

There are still countless large news sites whose Flash-based video players won’t run under Linux, and even CitiBank’s credit card site has a flash ad that plays incorrectly under Linux.

The real pain that I happened to see today was a company who uses coupons.com to allow customers to print out retail coupons. My first surprise was that to print the coupons, you have to download Windows or Mac software. I’m not quite sure how many people will do this, but it’s probably how viruses spread so quickly (people who will download anything that claims to get them half a dollar off of a roll of toilet paper, or whatever the coupons are for). So, that’s not cool – most coupons I’ve gotten were just HTML emails or PDFs. If their thinking is to control the distribution (they make some comment about a “paper-based printer, not a fax or PDF creator”), they’ve obviously forgotten about photocopy machines and scanners, let alone capturing the spool file on Mac.

More striking, however, was the shock of opening their help page. My primary monitor is a 24″ widescreen, and I generally keep a browser window occupying half the screen width and a terminal next to it. Once I opened their “help” site, it promptly resized my browser window to a tiny 640×480!

This problem, unfortunately, isn’t as rare as it should be. There are still sites that force browser size, disable right clicks (I hadn’t seen that since about 2004 until a few weeks ago… obviously someone who’s never used `wget`) or have a page that doesn’t fully work in FireFox on any platform. Even worse, my personal pet peeve (as at the time of writing this I have about 50+ tabs open in Firefox, and it’s only using a small sliver of my 2GB RAM) is sites that don’t play well with tabbed browsing – either using only JavaScript for all navigation links, or opening all links (site-wide) in the same tab/window. I don’t know how many web sites have lost my business because of this. Or the one I know of that starts a new shopping cart for every tab opened (so if I open each product I want to buy in a new tab, when I add them all to the cart, it ends up with only one).

I don’t know how there can be anyone out there who’s still not using valid XHTML with all of the accessibility features for anything new, especially a commercial site. But even more so, how can there still be people designing web sites who disregard the golden rule of web design: Don’t mess with someone’s browser. Leave things like where to open the link and how big to make the browser to the user. If they’re not technically literate, changing what “usually happens” will just confuse them. If they’re well-versed in how to use a web browser, like me, they’ll just get aggravated by having someone else change their workflow (I doubt the guys who designed those sites would like it if I told them they had to design the whole thing in Emacs). If they’re somewhere in the middle (just found Ctrl+click in Firefox), you’ll confuse them. And God forbid they’re blind and using a page reader… good luck with JavaScript or Flash navigation.

Ideas and Rants , ,

Cisco CatOS GBIC Information

June 20th, 2009

I have a Cisco WS-G4912 (12-port Gigabit aggregation switch) that I’m using to bring my network up to Gig-E. It’s about all that I could afford, and works fine. Most of my older servers are running 1000BASE-SX multimode fiber, but I decided to use copper GBICs for the new boxes that have onboard Gig-E ports. Unfortunately, $100+ for Cisco GBICs was way too much for me, so I found some third-party GBICs on Ebay from TNet USA right in Fairfield, NJ.

I wanted to make sure the GBICs work right, so I happened to find out about the undocumented CatOS command `show sprom [mod/port]` which shows the serial PROM information.

Uncategorized ,

HP Prolaint iLO SSH Problems

June 11th, 2009

There’s a known issue with the SSH implementation in the iLO firmware for HP Proliant servers (specifically G2 and G3) and OpenSSH 5.1p1. There was a thread on the OpenSSH developers list that referenced this problem and suggested a solution, but it doesn’t seem to be a sure fix.

This problem is present on my DL360 G2’s which are running the 1.84 2006-05-05 version of the iLO firmware (iLO 1.84 pass9) with the P26 2004.05.01 version of the system firmware. I also see the issue on a DL380G3 running iLO 1.92 2008.04.24 and system firmware P29 2004.09.15. The only way that I can reliably get into the iLO is by SSHing from a box with an older version of SSH, such as 4.2p1.

Most of the things that I could find online referenced unsetting the LANG environment variable:

unset LANG

and then SSHing with agent forwarding disabled:

ssh -a hostname-ilo

Unfortunately this combination doesn’t seem to do it for me.

I happened to stumble by this post to the debian-ssh mailing list, which suggested that shortening the new OpenSSH version string fixed the problem.

I was able to confirm that the version string is, in fact, the sole problem. I downloaded the source of OpenSSH 5.2p1 and, with the following small patch to version.h, managed to get SSH working to the iLO perfectly:

--- openssh-patched/version.h   2009-06-12 00:35:48.000000000 -0400
+++ openssh-5.2p1/version.h     2009-02-22 19:09:26.000000000 -0500
@@ -1,6 +1,6 @@
 /* $OpenBSD: version.h,v 1.55 2009/02/23 00:06:15 djm Exp $ */
 
-#define SSH_VERSION    "OpenSSH"
+#define SSH_VERSION    "OpenSSH_5.2"
 
-#define SSH_PORTABLE   ""
+#define SSH_PORTABLE   "p1"
 #define SSH_RELEASE    SSH_VERSION SSH_PORTABLE

I patched version.h, ran `./configure`, `make`, and then copied the compiled ssh binary to /usr/bin/ilossh, so that my original ssh binary would be intact, and the ilossh binary would be left alone by RPM upgrades.

Tech HowTos , , ,