My main computer is a somewhat dated Supermicro server / workstation. It’s a big noisy system that has six cooling fans to manage the heat from dual 2.5 GHz Intel Xeon quad-core processors, 16 GB ECC (Error-correcting code) RAM, and an 8 disk hardware RAID (Redundant Array of Inexpensive Disks) level 6 disk system. While not as fast as the latest i7 from Intel, it’s been fast enough for my needs and most importantly it has been super reliable. At least until yesterday morning!
Yesterday, when I tried to turn it on, all I got was the sound of all six cooling fans going to max throttle and faint beep that was barely audible above the leaf blower like fan noise. Video was blank. Hmmm… “That’s weird?” I powered off and checked all the connections and tried again with the same result. “This isn’t good.” I opened the cover. Nothing unusual there either. I got the manual to look up the POST beep codes. The beeps were faint and hard to quantify but it was clear that if it was beeping the problem was going to be video, memory, or a motherboard failure. I pulled the video card and put it in a second machine where it worked fine. I concluded that it was a memory or a motherboard failure. “Today isn’t starting off well.”
At that point I started to realized that the machine may be down for a while, maybe days, so figured that I had better find a way to get the data from it. Being the “good soldier” I have Backuppc scheduled to run every evening so my user files were safe on an external USB drive. In situations like this however, I never trust Murphy (myself) with just one copy of important files. While I had some older file system snapshots on USB drives, I had been busy and hadn’t updated those for a couple of weeks. The mission was clear, before I did any further testing and debugging, I had to get the system working long enough to make at least one more snapshot of the RAID disk. To do this, I pulled the Supermicro motherboard and temporarily replaced it with an Asus quad-core motherboard borrowed from my “Netflix” computer in the living room. It was a messy and time-consuming task, but I was relieved when I was able to boot the cobbled together system and see the RAID disk files intact. Being doubly cautious, I took two snapshots using Beyond Compare before removing the Asus motherboard and re-assembling the living room PC.
While the snapshots were running I started researching options for rebuilding the server should I find that I did have a complete motherboard failure. Ouch! It wasn’t going to be cheap and there were issues that made it seem that I may be about as well off replacing rather than repairing/upgrading the system. I had several web cart options ready for checkout but decided to work the problem a bit longer before I plunked down the cash for new hardware. I reinstalled the Supermicro board and checked everything. At this point, the only cards in the machine were video and disk controller; which were both known to be good. I pulled the CMOS backup battery long enough to effect a CMOS reset. Then I started checking RAM. I pulled the four modules from bank 2 first. Voila! It POSTs normally and displays the usual startup screen. Yippie! I’m getting off cheap! For good measure I swapped RAM modules around in various banks to confirm the finding. I ordered some new RAM modules and buttoned up the machine. By then it was 10:30 PM! I had wasted an entire day solving a problem but the good news is that I didn’t lose a single bit of data!
Some day something similar is going to happen to one of your computers. You may not be so lucky as I was. What are you going to do? If your data isn’t backed up, you may do more crying than anything else. My experience is your reminder to BACKUP YOUR DATA! Do it now before it’s too late!