The electric company knocked on my door today to say they wanted to replace my meter. I said that'd be cool and asked when. Then I saw the guys carrying one to the back of the house. We had a few moments' notice.
Fair enough, they'd earlier knocked on the door and left a post-it warning us. But by the time I'd gotten to the front door to peek, no one was there. I figured it was the crew fixing the lawn, 'cause they'd knocked earlier, and looking out the window, it looked like they might be done.
I warned the wife, and zipped inside and shut-down my computers not protected by UPS. I made it, all good. The systems in the basement are all hooked up to some hefty 1500VA UPSs. Well, half of the power supplies are, each to one UPS. The little server shares its UPS with the network gear, so the two routers and the LAN switch are all safe from short power outages. Short as in 15-30 minutes, with the bigger server lasting the shorter amount and the little server pushing the longer.
The electric company fella said "about five minutes," so I didn't fret, because I had lots more time than that.
My desk computers shut down a minute or so before they turned off the power. Nice Five minutes after the power went out, it came back on. Also nice.
My laptop was never interrupted. Annoyingly, the big monitor connected to it did turn off, but the meeting I was in was uninterrupted. So my attention stayed there. I did try to open the browser page to my router, but forgot the credentials to log in, so I let it go. I figured if the page came up, the router was still on, so all must be well.
My phone started to buzz with warnings about not being able to reach the server that sits under my desk, which I did power down. It has a UPS, but it's an older 200VA, really meant to protect it from brown-outs. It'll beep for maybe five minutes before it dies, so I thought this was going to be close enough to turn it off. These were expected.
A few minutes later, another knock on the door. I thought maybe they were returning to tell me the power was back, but they said "we did something wrong and need to turn it off." Sure thing, and off it went again, for about 10 seconds, as they probably capped something they forgot to cap before.
I continued my meeting and waited to make sure the electric company was solid before turning the system on. They never returned, so after the meeting, I turned it on anyway. My phone buzzed with messages saying things were coming back. All is good.
After the next meeting, I checked the monitors directly, and found one was still reporting off. The HTTP database check was still failing. It's a simple PHP page that runs a small, fast DB query on the MySQL database, but it was reporting a connection failure. I poked at the server, but couldn't SSH in. It's a nice Sun enterprise server, so it has an Integrated Lights On Management system, but I couldn't SSH there, either. Not because it wasn't available, but because my desktop has decided to stop supporting its encryption cypher. I pulled out the iPad, which does still work, and connected to the ILOM server. I checked, and there was a PSU fault, indicating maybe one of the PSUs didn't cycle back on after the power outage. Probably the PSU connected to the UPS's surge port and not the battery port. Normally the server survives PSU disconnections without a hitch, but maybe there was something going on, maybe with the little hiccup they gave us.
I went to the basement and was greeted with the angry orange lights of an unsettled server. The other was still running fine. All the UPSs reported full batteries and no alarms. One of the PSUs on the server also had a fault light. So I did the thing we're all told first with tech support--I turned it off and on again. I pulled the plug. Waited for the lights to die, waited a couple breaths more, and plugged it back in. The green flipped on, the fan started, and the orange fault light returned.
I had left my iPad upstairs, so except for running back and forth, there wasn't anything I could do. The other server is there, with a monitor and keys, and normally I'd SSH to the headless server from there (I've always meant to set up the serial port between them, but never have). But this server runs the same angry newer SSH that doesn't like the cyphers offered by the ILOM ports (including its own!). So I did the next best thing--I pulled both of the PSU cords. I waited for the lights to fade, for a really long time, and then some more, and returned the power cords. All the lights flickered green and none of the orange lights returned. On a full power-up, the system goes nuts, checking every fan at the same time, and some other things, and then it does a bunch of basic tests and takes a few minutes to get to the point of availability. So I went upstairs to wait instead.
While I waited, I searched for the SSH cypher problem, and found the key is to simply add a
HostKeyAlgorithm line in my config file. I did, cleaned up some "bad key" warnings, and was able to SSH to the ILOM server. From there I could watch the memory count and other checks, and finally see the login prompt. I logged into the running OS and started the database's zone. I think I have it configured correctly to start automatically, but it doesn't. A computer eternity later, it started, and a half of another eternity, and the database was running again.
This blog uses that database, so that I can make this entry shows it's working again. I'll run through a few checks and fix anything that went awry since it shut down impolitely.
I'm glad that the SSH is fixed. I'm glad there wasn't any noticeable real failures.