Gwnewch y pethau bychain

Tired…

I have no idea why, but I’m just exhausted today. I slept pretty well last night, and I’ve had plenty to eat, so I’m not sure why I feel run over and sleepy. And there’s gaming tonight, so it’ll be fairly late before I get to fall down, too.

I’ve spent the last two days at work dodging bullets, so to speak. I had gone over to the data centre to install the replacement for a downed server and I heard one of the raid enclosures sounding an alarm. So I went to find out what was wrong with it and discovered that the RAID-5 array had a bad disk. (The enclosure actually had two bad disks. One of those had been the global spare, which had been pulled in when the first went bad.) Now, a RAID-5 can operate with one disk down, but not two, so this system was one disk away from a fatal crash.

I pulled out the bad disks and went to contact the vendor, who luckily turned out to have a couple of spares of that drive on their shelf. (This is a stroke of luck, because those specific drives aren’t manufactured anymore.) I took them the old drives to RMA (they are still under warranty, despite not being made anymore) and drove back to the data centre to replace them and rebuild the array.

Once that was done and the array was healthy, I decided to make a cursory check of the other systems, to ensure they were also working properly, since it had been a while since I had audited the health of the systems [1]. To my horror, I found that the database server was in a similar state to the first server; while the first server going down and needing to be rebuilt and restored from backup would have been inconvenient, the database server doing down would be a disaster. I immediately contacted the vendor again, but they said they didn’t have any of these drives in stock, and it would take up to a week to turn around a warranty replacement. I said that wasn’t good enough and asked for a quote on a 24 hour delivery of a new drive, which they promised me by first thing this morning. I went back to my office and, after conferring with the brass, gave the drive to our IT guy to see if he could track down a replacement drive. he found one and I got it first thing this morning (about an hour after the quote from the raid vendor), so i rushed back over the the NOC once again to rebuild THAT array. Whew! (We’re now getting warranty replacements on the bad drives, so we can put another spare in the system and avoid unpleasantness in the future.

Personally, I don’t like my work being quite this exciting. 🙂

[1] I have since created a policy of explicitly checking all of these systems ever time the off-site backup tapes are rotated (every 2 weeks). The enclosure on the database server had an alarm sounding, but it was impossible to hear over the noise of the room.

Previous

Doubletake headline

Next

*sigh* More work woes.

6 Comments

  1. No, I can see why you don’t want work to be quite that exciting -- still, major desasters (or even minor ones) appear to have been staved off, which is good. *hugs*

    Wishing you a restful night tonight to alleviate the tiredness for tomorrow…

    • Yeah, alls well that ends well!

      Didn’t sleep well at all last night, but it was more for a lack of ability to get comfortable. But after today is the weekend, so I shall persevere!

  2. Now, what could I possibly do for you that would give you a nice burst of energy and some happy happy endorphins? 😉 -H…

  3. Well done!

    Thank you for being proactive. Unfortunately, in this biz, you don’t hear compliments from the end users for being proactive and averting disaster. You’d only hear their complaints after the servers crashed.

    • Re: Well done!

      Thanks! I’m feeling a bit guilty about it being such a near-disaster, since, as the leader of the Systems team, it’s my job to think of things like this and make sure they’re executed. (Things like, say, regularly auditing the raid boxes rather than doing it every 3 months or so as someone thinks of it. To be honest, if the first box hadn’t been in an alarm state, I’m not sure it would have been caught *shudder*)

      And of course, that’s one of the drawbacks to being in a colo facility. When our data centre was in the same building as our office, I saw my machines every day. Now, I’m probably not over there more than every 2 weeks or so. The positives do outweigh the negatives, but I sometimes miss having a more intimate rapport with my boxen.

Leave a Reply to Rob Wynne Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén