I tried to record the failure of the RAID controller on the DELL server and the rest of it up to that restoration



A problem occurred in the RAID controller installed in GIGAZINE 's server, and I decided to share and report the whole part of the repair as it was supposed to be useful somewhere by someone somewhere.

◆ Discovery

At the time of failure, the LED of the server blinks orange to notice the abnormality of the machine.


By the way, when it is normal, it lights up blue as shown below.


The DellServerAdministrator (SA) command was executed to identify the fault location, and the server's status was checked as follows. There is a "critical" indication in the hardware log (boldface part), which indicates that an abnormality has occurred.

# Omreport chassis
Health

Main System Chassis

SEVERITY: COMPONENT
Ok: Fans
Ok: Intrusion
Ok: Memory
Ok: Power Supplies
Ok: Power Management
Ok: Processors
Ok: Temperatures
Ok: Voltages
Critical: Hardware Log
Ok: Batteries


Then, when we executed the command as follows ... ....

# Omreport system esmlog

Severity: Critical
Date and Time: Tue Jun 19 11: 08: 06 2012
Description:The disk drive bay battery has failed.


The above display was made, and it turned out that the battery was abnormal. Please note that this machinearticleThe same as the machine causing the defect of the RAID controller battery that was set up, since the same error occurred although only about a month has elapsed, there was something insufficient in the last repair It seems.

So, when I consulted DELL 's hardware support, I got a conclusion that "There is a possibility of abnormality in the RAID controller itself, I will exchange the battery for just in case."

Part arrival

A new item of RAID controller (right) and battery arrived from Dell.


Unpacking the RAID controller box.


Contents is like this.


The place of taking out from the bag is as follows.


Battery box is also opened.


Here is also a bag.


Rechargeable lithium ion battery.



I plug the cable in this part and power the controller.


Since I got all the necessary parts, I just await the arrival of people in support.


◆ Start repair

Since the worker arrived, have you go into the server room and check logs first.


Since Atari is attached that it is a failure of the RAID controller, check that part intensively.


Shut down the OS, stop the server, start parts replacement work.

First, install identification tags so that you can connect the cables again after repairing.


Put a sheet to prevent static electricity generation and scratching on the floor.


It feels like a bit thick picnic mat.


Remove the machine from the shelf ... ....


open.


Remove the RAID controller.



Take out the new item and connect the cable.


Exchange with "Gashan" if done.


Next, remove the battery with the case.


Again, if you connect new parts and put them in their original location OK.


While looking at the tag attached first, I will connect the cables as before. Since it is necessary to disconnect the hard disk cable at the time of updating and initializing the RAID controller's firmware, the side panel is kept open.


Take out the CD and perform the firmware up of the RAID controller.



Update is complete.


Once you set the setting to the factory default state, you can restore the correct setting by loading the RAID configuration from the HDD.


Close the cover on the side of the case later ... ....


Put the machine on the shelf.


At this point, the lamp indicating the status of the hardware is blue and you can see that there is no problem.


Check the RAID configuration from the BIOS and confirm that it is functioning normally.


It is normal.


After that, if you delete the RAID log, the work is complete ... ....


An orange lamp indicating abnormality began flashing again.


The same as before the repair will appear that there is a problem with the battery. It is a result which is a little confused because the battery and the RAID controller are already exchanged for new.


People in support are also in trouble and are consulting with the company.


After all, as a possible malfunction at this stage, since the cable connecting the RAID controller and the battery is damaged or the motherboard itself is a malfunction, we will repair it at a later date.

◆ Second repair

I decided to start with a cable that is easy to replace from among motherboards and cables, which are the faulty part predicted based on the results of the last repair.


As before, check the log, shut down the OS, open the case and remove the part.


Remove the old cable ... ....


I will reconnect new ones.


Even though time passes, the blue lamp properly lights up this time.


After checking the status of the battery after starting the server, it was "Charging" and charging was completed in about 1 hour, so I was able to confirm that the repair was successfully completed this time. I hope the rest will continue running without anything like this, but I already have trouble since I have already failed one month ago, so I will have a lot of hunch.

in Review,   Hardware, Posted by darkhorse_log