I have no idea why, but I’m just exhausted today. I slept pretty well last night, and I’ve had plenty to eat, so I’m not sure why I feel run over and sleepy. And there’s gaming tonight, so it’ll be fairly late before I get to fall down, too.
I’ve spent the last two days at work dodging bullets, so to speak. I had gone over to the data centre to install the replacement for a downed server and I heard one of the raid enclosures sounding an alarm. So I went to find out what was wrong with it and discovered that the RAID-5 array had a bad disk. (The enclosure actually had two bad disks. One of those had been the global spare, which had been pulled in when the first went bad.) Now, a RAID-5 can operate with one disk down, but not two, so this system was one disk away from a fatal crash.
I pulled out the bad disks and went to contact the vendor, who luckily turned out to have a couple of spares of that drive on their shelf. (This is a stroke of luck, because those specific drives aren’t manufactured anymore.) I took them the old drives to RMA (they are still under warranty, despite not being made anymore) and drove back to the data centre to replace them and rebuild the array.
Once that was done and the array was healthy, I decided to make a cursory check of the other systems, to ensure they were also working properly, since it had been a while since I had audited the health of the systems . To my horror, I found that the database server was in a similar state to the first server; while the first server going down and needing to be rebuilt and restored from backup would have been inconvenient, the database server doing down would be a disaster. I immediately contacted the vendor again, but they said they didn’t have any of these drives in stock, and it would take up to a week to turn around a warranty replacement. I said that wasn’t good enough and asked for a quote on a 24 hour delivery of a new drive, which they promised me by first thing this morning. I went back to my office and, after conferring with the brass, gave the drive to our IT guy to see if he could track down a replacement drive. he found one and I got it first thing this morning (about an hour after the quote from the raid vendor), so i rushed back over the the NOC once again to rebuild THAT array. Whew! (We’re now getting warranty replacements on the bad drives, so we can put another spare in the system and avoid unpleasantness in the future.
Personally, I don’t like my work being quite this exciting. 🙂
 I have since created a policy of explicitly checking all of these systems ever time the off-site backup tapes are rotated (every 2 weeks). The enclosure on the database server had an alarm sounding, but it was impossible to hear over the noise of the room.