"Raw data" of model failure rate obtained by operating more than 40 thousand HDDs is made available for download


ByDaniel "LWS" Nimmervoll

Online storage serviceBackblazeBased on HDD operational data published by "HDD reliability report by manufacturer"The high reliability of a specific manufacturer and the specific manufacturer / specific modelOverwhelming breaksEtc. are clearly visible, not only attracting the attention of the industry stakeholders, but also helping the general public to examine the HDD greatly. However, while the existence of a model showing the outstanding failure rate is revealed, it is natural that the simple question of "Is this data really true?" Comes to light. In order to answer such a question, Backblaze is trying to solve the problem of over 40,000 HDDsS.M.A.R.T.(smart)The value "raw data" is released at once. It explains how to compile failure rate.

Hard Drive Test Data - Determining Failure Rates and More
https://www.backblaze.com/hard-drive-test-data.html

The smart value of the HDD which we have been operating up to the bottom of the above page is released in CSV file by fiscal year. Of course you can download this data. A huge file size of about 800 MB for 2013 and about 3 GB for 2014 is telling the "density" of the data.


For example, the data on April 10, 2013 looks like this. As for the HDD, not only the maker · model number but also serial number is written, and smart values ​​are filled in 80 kinds in all, and "0" is normal value. However, there are blanks for each model because there are different types that are output smartly measured by the manufacturer, and some values ​​are not known from manufacturers and it is unknown what meaning they have Therefore, it is necessary to compare relatively by model.


The published raw data is aggregated every day on all HDDs to be operated. It seems reliable just by the fact that it is the data itself used for business. Since it is the most raw data, it is difficult to understand the high failure rate and reliable HDD even if you are looking at it. Therefore, since Backblaze exemplifies how to calculate the HDD failure rate from raw data, I actually tried it.

The siteDownload the file "docs.zip" "2014_data.zip" from the bottom of the page, save it on your desktop etc.Explzh"And use it to unpack it.


next,SQLite official pageClick "Download".


Since this time we use a PC with Windows, click the command line shell program of "Precompiled Binaries for Windows" to download and unpack the ZIP file.


Make a folder in a favorite place and click on the "2014_data" folder in the "2014_data" folder that you downloaded and decompressed earlier2014 folder"In the" docs "folder,"All SQL files"From the SQL folder,"Sqlite3.exeCopy all of them. For the sake of clarity this time, I created a folder called "test" on the desktop and copied all the files.


Next, activate the command prompt, enter "cd desktop" -> "cd test", move to the test folder, then "Sqlite 3 drive_stats.dbRun and run.


".read create_table.sqlRun and run.


".read import.sqlRun and run.


Then CSV file in folder 2014 will be imported and wait for a while. When the import is completed,.read stats.sqlRun and run.


".mode columnsRun and run.


Finally"Select * from failure_rates order by model;When you enter and run ... ...


Total data of failure rate of HDD for each model was output like this.


This output result is graphed in the article "It turned out that the operation data of more than 40 thousand HDDs was released and the tendency of reliable hard disk manufacturers changedIt is a graph (with color) used in ".


In addition, HDD crash raw data published by Backblaze freely uses and shares under the condition that it shows "Source is Backblaze", "Use data with self-responsibility", "Do not sell data" And it is OK with that.

in Note,   Hardware, Posted by darkhorse_log