I tried identifying the server failure point with DSET (Dell System E-support Tool)



Based on the policy "as cheap as possible" and "not to make accumulation problem", GIGAZINE 's server was drowsy What is bothered is to manage with cost Important tower type as if it were home server side by side as a decision of bitterness Although there is, "If it is Amazon's AWS, the cost is not worth the price, but if you are dreaming that you can transfer if Sakura's cloud," or the like, the system detects about 2 servers issuing error notifications.

While thinking "What at the end of the year ......" I decided to go to the server room.


Physically it seems like this


Orange lamp is lit, this is it, confirmed


Looking at the back, the lamps on the right side of the two power supply units are off


The left side is alive, but the right side died. Since it is a redundant power supply, the server itself continues to move, but, of course, we will exchange it as soon as possible.


The power supply unit of the other server is safe, and as far as the viewer sees, it is unknown what kind of error. There is a need to investigate in detail.


There are many ways to specify the location of the server failure, but this time to receive DELL support, DELL 's "DSET (Dell System E-support Tool)"Is used. Since it is managed by CUI instead of GUI, download the corresponding file from below (DSET's top page "Dell System E-Support Tool"Bit and the link destination are misdirected, and even if you click" 32 bit ", because the link destination is 64 bit, we carefully changed the parameters of the URL and confirmed the correct linking destination this time).

32 bit version: File name "delldset_v2.2.125_x86_A01.bin"

64bit version: File name "delldset_v2.2.125_x64_A01.bin"

The procedure itself is collected in the DELL support page below.

How to use the Dell PowerEdge series DSET tool for Linux (v2.x) - JPFAQ_198510 | Dell Japan

Transfer the downloaded diagnostic tool file with WinSCP etc. and log in with SSH. This timePoderosaWe used (December 1, 2011 "4.3.6b-experimental"Has been released)

As the contract comes out, press "q" key to proceed.


Press the "y" key and press the "2" key to run the diagnostic tool.


Enter company name and email address


I will wait for a while.


Set various options as shown below and wait for the report to be generated


I will wait for a while ...


If this happens, the report generation is completed


If you send the generated ZIP file to DELL support, DELL will identify the problem part, but if you look at the report in this ZIP file it will be easy to identify the problem by yourself is. According to the manual, the password of this ZIP file is "dell", it is possible to see the report which is cleaned up by unpacking and opening "dsetreport.hta" inside.


In the case of a server that looks like the power supply has failed, it is the error log that "X" is attached in this manner when looking at "Hardware Log" from "System".


When choosing "Main Chasis" from "System" and looking at "Power Supplies" it is certainly one of the power supplies is dead as "x" mark. To further strictly isolate the problem, shut down the server power, remove the power supply unit that is supposed to be dying, reset the live power supply unit, turn on the power and start it, Since it can be determined that the power supply unit is dead, replacing it with a new power supply unit is OK. If you do not boot it is dead in the motherboard part further than the power supply unit, so the motherboard of the server will be exchanged (there are cases different).


As far as the server is concerned, it is easy enough to identify the problem even from the outside only, but the other one is unknown as to where the error is issuing as far as it sees. When I started DSET with similar procedure and generated a report, a tremendous amount of logs occurred, it seems that the area to write logs is full with "Log is full".


As far as the contents of the error log are concerned, the battery of the RAID controller seems to be the cause of the error, but the battery of the controller of the controller itself does not cause an error at "the present time".


Then restart the server physically to clear the log once or use "DSET" to "3) Clear ESM Hardware Log OnlyIf you choose, you can clear the log without physically restarting it, and it is possible to erase the error warning and lamp lighting. Happy ending.

That's why GIGAZINE casually recruits server administrators, so if you say "this degree of management and maintenance is likely to be done"click hereI would be very happy if you could send me your resume and job history. To prevent spam, "ReCAPTCHA Mailhide"using.

...... Yes, I thought this article was a recovery record of server trouble, the actual situation was a recruitment article, what is that!

in Review,   Software,   Hardware, Posted by darkhorse