Archive

Posts Tagged ‘UPS’

Downtime past few days, coping with storms

March 17th, 2010

As far as I can see from Google Analytics, though I’m now up to about 4,000 visitors per month for this blog, it seems like most are one-time visitors. So hopefully the few hiccups of the past few days weren’t noticed (except by me, and my mailserver…). As many of you in the US may know, New Jersey was hit by a pretty big nor’easter last weekend and my area was especially hard hit. I lost power for about five hours Saturday night (the 13th) and again for an hour or so the next morning. Needless to say, these were both much longer than my small UPSes could cope with (the smaller of the two has a mere 12 minutes of runtime with brand new batteries). While I do have a small (2500W) gas generator, it was buried in the back of the garage. The time it would take to dig it out and string extension cords from the UPSes to a relatively dry area outside would probably be more than 12 minutes – not to mention that as first lieutenant (second-in-command) of the town’s volunteer ambulance corps, I was running in the opposite direction once things got bad.

Luckily, the problems caused or uncovered by the power loss were relatively minor:

  1. The batteries in my smaller UPS lasted about 90 seconds. I’d noticed the bad battery light a few weeks ago, but put it off (hey, it’s my home setup, I don’t exactly have a budget). RefurbUPS, where I got the two APC SU1000NET’s, got me a replacement RBC6 (aftermarket, not original APC) in two days, for around $73. Unfortunately, my Tripp Lite SMART2200RMXL2U (unfortunately without the external battery pack) also just had its orange light pop on, so there goes another $90.
  2. The battery replacement caused another short (~5 minute) downtime tonight, including my router. Unfortunately the router is plugged into the smaller of the two UPSes, and none of my current boxes have more than one power supply. Mistake. Especially considering my limited resources. On the positive side, I took the opportunity to move all of the equipment onto Liebert MicroPODs.
  3. The external SATA disk for my desktop used to store media is dead. I’m getting incessant read errors, followed by offlining the disk, on two USB adapters and internal SATA. I guess I should have a better surge protector for the desktop.
  4. A few of my boxes didn’t come up clean, mainly due to entropy in the configs, or services that were never chkconfig‘ed. Well, I don’t use any configuration management at home.
  5. One of the disks in my storage server (RAID 1+0 of 6x 36.4Gb 10k RPM SCSI disks) died – actually a few days before the power outage. I was able to find a new replacement on Ebay, with warranty, for about $40.
  6. Many of my internal services – including a lot of the Nagios checks – use a separate gigabit management network, run off of an older GigE aggregation switch. This was on the smaller UPS, so it went down quick. As both this and my main switch are of the same series, perhaps an RPS unit would be worth the money.

Plans for the future:

  • Get Puppet working at home. As part of this, buy a spare server so I can migrate services one at a time to a puppet-ized version, test, and then rebuild (while the production machine is still running).
  • In the future, given my limited infrastructure, purchasing dual power supply machines – and putting the PSes on separate UPSes – would be a good thing.
  • Configure the Tripp Lite UPS to do load shedding.
  • Setup UPS management software to bring down boxes in a logical order.
  • Power savings is also important – so I’m thinking about rebuilding things using Xen, and using live migration to both distribute load logically under normal circumstances, and to move around loads in the event of power failure (kill non-critical/internal-only stuff, consolidate the rest).

Miscellaneous Geek Stuff , , , ,

Cable Management, Power Measurements, Major Outage, Cacti

March 6th, 2008

So, once again, still really busy. But a few new things.

First, my racks both at home and at the apartment are atrocious. They have no cable management at all. Both started with 1-3 machines, and no real plans for upgrades (since they’re just my personal/development machines). Unfortunately, the “rack” (a metal workshop shelving unit) at home now has 8 machines and a host of ancillary equipment. The one at the apartment – an actual 42U rack – has 5 plus a few switches, rackmount KMM, etc. They’re both a jumble of wires in the back. Unfortunately, it seems like cable management hardware is *epxensive*. $30 for a 2U metal blank with a few plastic split D-rings, or almost $40 for a 2-meter vertical hunk of plastic channel with slits in the sides? So, I’ve been vaguely considering what it will take to fabricate some cable management hardware of my own. Probably just building something out of rack blanks for the horizontal off of the switches, and buying some sort of vertical channel for power and networking/KVM. Man, those KVM cables sure do take up a lot of space. Also at the moment, at home my power is all coming directly out of two UPSs, whereas at the apartment it’s straight from mains off of a surge suppressor. I’s like to buy another UPS for the apartment from RefurbUPS.com, where I got the ones from home, and also add a PDU at home and a vertical power strip at the apartment.

Also, at the apartment, the roommates and I have had some discussion lately about how much power the machines draw. This mainly stemmed from our plans to move this June, into a rented house with two more people. This seems to be falling through, so I don’t have to worry about moving and re-cabling everything, but I’m still interested in finding out how much power is being drawn. Granted, my UPSs at home give me a more-or-less good idea of power consumption, but I’d like to know in detail. The ideal solution would be a clamp ammeter around the mains line to the equipment – one with a serial interface. Unfortunately, I can’t seem to find such a thing, short of a digital multimeter left on all the time. So, I guess I’ll be looking around, and if I can’t find anything specific, maybe I’ll work on a microcontroller that can read 1-200mV in 1mV increments, and use it with an inductive clamp ammeter (usual output for them is 1mV per A).

So, on Monday I got into work and couldn’t access my mailserver. Weird. I never even got any Nagios alerts. I checked Nagios and… nothing. As in no connection. I SSH’d home and pinged both boxes, but nothing. The switch showed the mail server totally offline, and the Nagios box plugged connected but ZERO data out. I reset the counters and waited. Still nothing. After an hour or so of poking around, I determined that both devices were on the same 6-port group on the switch, and nothing else there was up too. So, after five long hours, I got someone back home to switch the cables. Still nothing. On a hunch, I asked to have her check the mail server (the “new” Sun Blade 150) and, sure enough, it wasn’t powered on. A click of the power button, and the mail server was back online. Along with an ominous last email from Nagios, stating that the UPS running my switch lost power, and 6 minutes later, was going down hard. Then quiet.

I don’t usually have power outages. So I’ll admit, when I added some of the new machines, I committed a high sin – I “never got around” to setting up everything power-wise. I also have the switch running off of an old BackUPS 500VA unit, USB, without automatic self-tests. As a result of all this:

  1. The little UPS powering the switch only held out for 6-7 minutes. As a result, once that died, the bigger units didn’t even matter, as all hope was lost. This needs to be on a bigger UPS – maybe one of the 1000VA’s until it gets its’ own.
  2. APCupsd requires a network to initiate shutdown, so the rest of the machines came down hard (as confirmed by looking through log files).
  3. The SunBlade was never setup to power on after power interruption, so it just sat there like a brick.

Most disturbingly, while my Nagios/monitoring box is up (according to the switch, power draw figures from the UPS, and the lights, as confirmed by someone on-site), it’s dead. No ping, nothing out. I’ll have to look into it, but it made me realize that this really is my only way of analyzing problems. That needs to stop.

Maybe one day I’ll have the money for a nice SmartUPS RT or even a Symmetra – though getting 208V into my basement is even more of a dream than spending $4000 on a UPS.

Also, I decided (after all this) to setup graphing of UPS data (load, voltage in and out, temp, capacity, run time, etc.). While I haven’t gotten around to setting up Zenoss yet, I did a quick (well, 4 hours later I’m done configuring it) Cacti installation on my web server (I should already have it running on the monitoring box, but who knows what that will look like when I get home). I also dropped a Cacti host template in CVS for the AP9605 PowerNet SNMP card in my UPSs.

Projects , , , , , , , ,

APC AP9605 PowerNet SNMP Card

March 1st, 2007

In the theme of upgrades, I also purchased two APC SmartUPS1000 units from refurbUPS.com. Now, I know that a lot of people are perfectly happy with serial connectivity. And it has its positives. But I’m running 2-3 servers per UPS, older servers, wanted to be able to monitor the UPSs, and perhaps control server shutdown, over the network.. So, I found that refurbUPS.com also sells SNMP management cards for them. They sell a refurbished AP9605 – it’s an old 10BaseT PowerNet SNMP-only card (with telnet management). Seemed good, and the price of $15 was right.

They showed up, but I couldn’t find much about them online, let alone anything useful.

After a phone call to APC, I managed to get the user’s manual emailed to me. The few instructions I found online were totally wrong.

The general setup goes like this:

  1. Connect network cable to card.
  2. Connect serial cable between a computer and the UPS’s serial port.
  3. Get a terminal emulator, like minicom. Speed is 2400bps.
  4. connect and press enter. You’ll be asked for a username and password. Use “apc” for both.
  5. 5) Setup the network – IP, mask, gateway, etc.
  6. Ready-to-go!

I also have some information about the card on my wiki at: AP9605.

Tech HowTos , , , ,