Archive

Posts Tagged ‘apc’

Daily Work – Nagios SNMP traps, Vyatta, JasonAntman.com upgrades

June 27th, 2009

So it’s been a very busy day. I was up until 5 AM or so working on implementing Puppet at home. I’m building two new boxes – a storage (centralized home directory)/syslog (to MySQL) server and a second web server (possibly also to handle Nagios) – and I decided that they’ll be totally built by Puppet. The only thing I had to give up on was setting up the NFS share for my home directory on the new storage box and installing and testing rsyslog on it.

This afternoon around 7, I started on my weekend projects for the ambulance corps – setting up Nagios to receive SNMP traps from the APC UPS and moving over to the new Vyatta-based router (from m0n0wall). I’d attempted the router before, but had to rollback – I’m using an old BlueSocket controller for hardware – it’s just a nice black 1U enclosure with a stock Intel motherboard, 20GB HDD, 512MB RAM and three 10/100 NICs. The first time, I was unable to get link on either of the two NICs I was using, so I decided to rollback.

Nagios SNMP Traps

I found a good starting point for Nagios SNMP traps on the OpsView blog. I setup `snmptrapd` on the Nagios server and hacked together a little Python script to just write all of the traps to a file. After some testing with `snmptrap` on my laptop, I did a test by pulling the power plug of the UPS, waiting about 30 seconds, and then plugging it back in. Sure enough, the little old AP9605 PowerNet SNMP card generated two SNMP traps – one for power loss and one for power regained – both of which showed up in the test file

The next step will be deciding how to get the traps into Nagios – specifically whether I want to go with something heavy-weight, like SNMPtt that can handle other devices, or whether I want to code a simple script myself just to deal with the APC cards.

Router

The main reason why I wanted to make the switch from m0n0 to Vyatta was to ease the setup and maintenance of an IPsec tunnel from the ambulance HQ to my house, so I could push backups (relatively small) over the WAN to my infrastructure (or, rather, have Bacula pull the backups). Another big bonus was finally having a way of configuring and checking things through SSH without having to port-forward a web GUI. Another bonus of having a real Linux system under the router is the ability to make custom Nagios check scripts and easily execute them. Something I hadn’t thought of – but became obvious during the switchover – is the ability to run full-fledged `tcpdump` on the router itself.

After building the new config myself, and confirming that the system ran in isolation, I moved it over to production. The first issue was a bit of a thinko on my part – the interfaces on the BSC are actually arranged on the back of the box like eth0—–eth2—–eth1, so I originally had the LAN uplink in the wrong interface. After correcting that and waiting for the network to stabilize, I noticed a total external connectivity failure. After some troubleshooting – thanks to tcpdump on the router – it occurred to me that the (ancient) cable modem needs to be rebooted when the router MAC changes.

I honestly don’t remember the other problems that I ran into, but eventually I ended up getting almost-full functionality – and then a total network outage. A tcpdump on my laptop showed some really really weird BOOTP traffic with addresses of 255.255.255.255. After doing some troubleshooting and monitoring port counters on the switch, I narrowed it down to coming from a single Windows box and the wireless access point. After shutting off both ports, things seemed to stabilize. I also had some “martian address” issues with one of the boxes, but decided to roll the box and that solved it.

Over the next day or so, I’ll be reconfiguring Nagios both at home and at the ambulance corps to cope with the changes and add in the requisite monitoring, and keep an eye on things. Assuming all goes well, I’ll power down the old router on Sunday.

On the home front, I’ve moved over from my old storage machine to the old one – essentially just the NFS mount, and moved over a tarball of everything else. I also added a 1000Base-SX card to the new box, though it appears that I’m out of fiber patch cords. The old storage box was brought down for the first time in about 3 years (aside from brief outages for hardware upgrades or array rebuilds). Assuming I got everything off of it, it will be relegated to the spares pile.

I’m going to make a serious effort to post on a daily basis, if only for my own future reference. I should have the demo of RackMan out soon, and I’m also about to start on integrating it with Nathan Hubbard’s MachDB as well as a PHP script I wrote to pull port names and MACs from Cisco switches and associate them with NICs in machines. Hopefully I’ll also have some interesting Puppet stuff out soon.

Miscellaneous Geek Stuff , , , ,

Cable Management, Power Measurements, Major Outage, Cacti

March 6th, 2008

So, once again, still really busy. But a few new things.

First, my racks both at home and at the apartment are atrocious. They have no cable management at all. Both started with 1-3 machines, and no real plans for upgrades (since they’re just my personal/development machines). Unfortunately, the “rack” (a metal workshop shelving unit) at home now has 8 machines and a host of ancillary equipment. The one at the apartment – an actual 42U rack – has 5 plus a few switches, rackmount KMM, etc. They’re both a jumble of wires in the back. Unfortunately, it seems like cable management hardware is *epxensive*. $30 for a 2U metal blank with a few plastic split D-rings, or almost $40 for a 2-meter vertical hunk of plastic channel with slits in the sides? So, I’ve been vaguely considering what it will take to fabricate some cable management hardware of my own. Probably just building something out of rack blanks for the horizontal off of the switches, and buying some sort of vertical channel for power and networking/KVM. Man, those KVM cables sure do take up a lot of space. Also at the moment, at home my power is all coming directly out of two UPSs, whereas at the apartment it’s straight from mains off of a surge suppressor. I’s like to buy another UPS for the apartment from RefurbUPS.com, where I got the ones from home, and also add a PDU at home and a vertical power strip at the apartment.

Also, at the apartment, the roommates and I have had some discussion lately about how much power the machines draw. This mainly stemmed from our plans to move this June, into a rented house with two more people. This seems to be falling through, so I don’t have to worry about moving and re-cabling everything, but I’m still interested in finding out how much power is being drawn. Granted, my UPSs at home give me a more-or-less good idea of power consumption, but I’d like to know in detail. The ideal solution would be a clamp ammeter around the mains line to the equipment – one with a serial interface. Unfortunately, I can’t seem to find such a thing, short of a digital multimeter left on all the time. So, I guess I’ll be looking around, and if I can’t find anything specific, maybe I’ll work on a microcontroller that can read 1-200mV in 1mV increments, and use it with an inductive clamp ammeter (usual output for them is 1mV per A).

So, on Monday I got into work and couldn’t access my mailserver. Weird. I never even got any Nagios alerts. I checked Nagios and… nothing. As in no connection. I SSH’d home and pinged both boxes, but nothing. The switch showed the mail server totally offline, and the Nagios box plugged connected but ZERO data out. I reset the counters and waited. Still nothing. After an hour or so of poking around, I determined that both devices were on the same 6-port group on the switch, and nothing else there was up too. So, after five long hours, I got someone back home to switch the cables. Still nothing. On a hunch, I asked to have her check the mail server (the “new” Sun Blade 150) and, sure enough, it wasn’t powered on. A click of the power button, and the mail server was back online. Along with an ominous last email from Nagios, stating that the UPS running my switch lost power, and 6 minutes later, was going down hard. Then quiet.

I don’t usually have power outages. So I’ll admit, when I added some of the new machines, I committed a high sin – I “never got around” to setting up everything power-wise. I also have the switch running off of an old BackUPS 500VA unit, USB, without automatic self-tests. As a result of all this:

  1. The little UPS powering the switch only held out for 6-7 minutes. As a result, once that died, the bigger units didn’t even matter, as all hope was lost. This needs to be on a bigger UPS – maybe one of the 1000VA’s until it gets its’ own.
  2. APCupsd requires a network to initiate shutdown, so the rest of the machines came down hard (as confirmed by looking through log files).
  3. The SunBlade was never setup to power on after power interruption, so it just sat there like a brick.

Most disturbingly, while my Nagios/monitoring box is up (according to the switch, power draw figures from the UPS, and the lights, as confirmed by someone on-site), it’s dead. No ping, nothing out. I’ll have to look into it, but it made me realize that this really is my only way of analyzing problems. That needs to stop.

Maybe one day I’ll have the money for a nice SmartUPS RT or even a Symmetra – though getting 208V into my basement is even more of a dream than spending $4000 on a UPS.

Also, I decided (after all this) to setup graphing of UPS data (load, voltage in and out, temp, capacity, run time, etc.). While I haven’t gotten around to setting up Zenoss yet, I did a quick (well, 4 hours later I’m done configuring it) Cacti installation on my web server (I should already have it running on the monitoring box, but who knows what that will look like when I get home). I also dropped a Cacti host template in CVS for the AP9605 PowerNet SNMP card in my UPSs.

Projects , , , , , , , ,

APC AP9605 PowerNet SNMP Card

March 1st, 2007

In the theme of upgrades, I also purchased two APC SmartUPS1000 units from refurbUPS.com. Now, I know that a lot of people are perfectly happy with serial connectivity. And it has its positives. But I’m running 2-3 servers per UPS, older servers, wanted to be able to monitor the UPSs, and perhaps control server shutdown, over the network.. So, I found that refurbUPS.com also sells SNMP management cards for them. They sell a refurbished AP9605 – it’s an old 10BaseT PowerNet SNMP-only card (with telnet management). Seemed good, and the price of $15 was right.

They showed up, but I couldn’t find much about them online, let alone anything useful.

After a phone call to APC, I managed to get the user’s manual emailed to me. The few instructions I found online were totally wrong.

The general setup goes like this:

  1. Connect network cable to card.
  2. Connect serial cable between a computer and the UPS’s serial port.
  3. Get a terminal emulator, like minicom. Speed is 2400bps.
  4. connect and press enter. You’ll be asked for a username and password. Use “apc” for both.
  5. 5) Setup the network – IP, mask, gateway, etc.
  6. Ready-to-go!

I also have some information about the card on my wiki at: AP9605.

Tech HowTos , , , ,