Over one week has elapsed since NTT DATA 's blog service "Doblog" stopped due to hard disk failure, what the hell is going on?



The other day headlineHowever, as I wrote, NTT DATA's blog service "Doblog" has broken down at 10 AM on February 8, so it has passed over a week or more, but still, I have not recovered. Number of active usersAbout 2000 to 3000It is supposed to be a blog of its own size, but what on earth happened?

Whether it was not organized RAID, how the backup system was, whether 100% recovery is possible, what is the reason why restoration is delayed for the first time, but not even the aim of restoration is not right? Despite this long-term failure continuing,The NTT DATA itself reports on the disorder was reported on February 16, when more than one week has elapsed since the disability occurred on February 8. Is Doblog no longer an abandoned service from NTT data?

So, to summarize the current obstacles, and what kind of position Doblog was for NTT DATA from the following.
Doblog - Maintenance
http://www.doblog.com/


■ Contents of this obstacle seen in chronological order

The first report came out at 10:15 am on February 8 (Sunday).

Fault has occurred in Doblog now.
We apologize to everyone for causing inconvenience.

2009.2 / 8 10:15 Doblog Editorial Department

About two and a half hours later at 12:42, it turned out that trouble occurred in the hardware. And after about 13 hours later at 23:50, I posted that "Database server has a hardware failure", as of 23:25 it makes the following judgment.

23: 25 At present there are no prospects for restoration, taking into account the opinions of hardware vendors and others, tomorrow at noon
I will contact you again.

And finally the day after February 9 (Monday) on the next day after 24 hours or more has elapsed since the failure occurred the fault occurred on the database server hard disk,It takes a few days for recoveryReport that it is. At 16 o'clock in the evening, the prospect of restoration was shown for the first time.

It is a prospect of recovery, but we are thinking that it will be around night on Friday, Friday, if we go smoothly.

It can be seen that it is a considerable emergency at the point that 5 days are required from the occurrence of the failure to the recovery.

Then, it should be found and scheduled to be recovered that "data inconsistency has occurred in the server recovery process" around 12:45 on February 13 (Friday) after 5 days since the failure occurred Shocking notice will be announced at 21:50 on the 13th night.

2/8 around 10:00, hard disk failure on both Doblog database server and backup server
It occurred and we are doing recovery work of internal data from the day.
Even though the restoration work is still ongoing, errors occurred in the data that was scheduled to be recovered at the beginning at 2/13,
2/13 21: 50 Now it has been forced to postpone it to the next week (2/16 weeks).
I am sorry really for a long-term service suspension.

It will start to understand that 100 percent of data recovery seems to be a desperate situation if it comes to this place, but it is confirmed that it will enter an emergency situation that service stoppage will be really over a week from the occurrence of the trouble. Furthermore, rather than restoring on February 16th, it is even suggested that the service may be stopped for February 16th, in the worst case for nearly two weeks, and even at the time of writing this article, the correct recovery date and time The fact is that it is not shown.

■ History of Doblog, what kind of position service is for NTT DATA?

Originally Doblog began in November 2003.

NTT DATA, experimental service of "Doblog" that can open a blog for free

NTT DATA expects 10,000 monitors to participate in this experiment. "Blog sites can be easily created with maintaining compatibility with a typical tool for creating blogs" Movable Type "", and in the future data mining of each blog description will be provided to companies We aim to commercialize.

One month before this, in November 2003, "Beta test"Has begun, and"Doblog - Announcement listAs you see, you can see that it was quite close to the user's eyes, and that a fairly friendly operation was done. It will not be a challenge now, but April Fool of April 1, 2004 seems to have had enough time to do such things.

[2004/04/01] [Correction] Announcement of server maintenance


Exactly the best. Such a "Doblog official release" planned date was July 15, 2004, but on July 12 th three days ago, "With instability of the current server, it is judged that the environment for providing the official version service is not sufficient"We have announced that we will delay the official release.

I came up with various tunings to improve the instability of the server, but the increase in the number of users and the number of pages viewed is approaching the performance limit of hardware faster than we anticipated.
Currently we are considering advancement of server equipment additions ahead of schedule, so thank you for your patience for a while.

It is true that when you look at the announcement page at this time, frequent additions of functions are repeated like a mountain, and at the same time server outages and emergency maintenance stops from 30 minutes to 1 hour frequently, and on July 9, 2004 Although the cause is unknown, it is said that "From 10:50 to 12:17, the obstacle that makes it impossible to access Doblog has occurred", which shows that it was quite a crisis situation.

There are several descriptions that will be hints about current obstacles around 2004. Let's excerpt.

[Sep. 13, 2004] Notice concerning postponement of release and schedule of system restart

We had worked on database multiplexing and function addition for the official version today, but unexpected obstruction occurred near the end of the work, we had to abandon today's release.

I will return the system to the state as of 9:00 on September 13, 2004. (Since we always back up blogs, comments, and all other data you have registered, there is no worry that the data itself will be erased.) Please do not worry about that point.

[2004/11/05] Announcement of future plans for system expansion / decentralization

DB will be reinforced in the middle of the week of November 15.
In this case, we add the originally planned DB server and cluster it, and perform radical response measures.

The database server this time is properly clustered and seems to be always backing up the data itself. Since this time it has not recovered even after more than a week has elapsed, it is understood that it is a failure on an unprecedented scale.

The circumstances around here are to be found a bit in the following article.

ITmedia Enterprise: Chronic fault with Postgres tuning, Doblog scheduled large maintenance

Originally Doblog started as an experimental service assuming that 10,000 users were limited. The system configuration of Doblog is a Web server, application server, DB server, and it means that each server is one each.

In addition, the current DB uses PostgreSQL of OSS. It is obvious that DB is the bottleneck of this malfunction, but I made a clustering development of PostgreSQL for improvement, but it is actually a fact that I failed.

In November 2004,Doblog staff blogSo the editor-in-chief has made it clear what kind of direction the Doblog is pursuing as a business, and what kind of achievements there are.

Since then, Doblog related business has also come up. This is to nurture the experience and technology cultivated at Doblog.com as a business seed. As a result, we have done consulting business on blog and OEM version sales of Doblog. Currently, we sold several OEM versions of Doblog and delivered it.

As described above, the Doblog Editorial Department is required to smoothly turn both wheels of "management of Doblog.com" and "promotion of Doblog business" in a well-balanced manner as a project. I believe that smooth operation is the responsibility of the editorial department for all of our users, and for business it is essential activity to continue Doblog as a project within a company.

From December 2004Started advertisement display experiment jointly with cyber agent, Since it was a capacity unrestricted service in February 2005 "Store only images in Doblog and only request images from external sites"Check the case, and refuse access to image requests from external site. furtherRSS caching to eliminate response degradationMaintenance by March 3rd, 2005 (Thursday) will not be completed as scheduled, close to four hours from 13:30 to 17:35, and further on March 14 (Monday) We plan to stop by 8 hours maintenance from 10 o'clock to 18 o'clock, but it was actually 19 o'clock and stopped for nearly 9 hours. Slow down of servers at midnight and frequent access difficulties,September 2005In order to relocate the data center at last it will enter a situation where the service stops for less than two days (browsing, writing, modification is impossible).

And on 26th November 2005, at lastHardware failure occurred, To stop the service for 9 hours from 5:30 to 14:30. From 23 o'clock on December 12, 2005 until 2 o'clock on the next day 13Network problemStop for 3 hours. The next year 's 2006 is from January 10 (Tue) 6: 25 to 9: 55Three and a half hours fault occurrence, About 3 hours from 2:30 to 5:20 on January 12 (Thursday)system down, Even after that it will be unavailable for several hours after maintenance, but at 12 o'clock on Friday, June 16, 2006, it was able to reach official service release.

However, furthermore the failure occurrence did not know where to stay,July 27, 2006We are currently implementing the repair work of SQL which is one of the causes due to the high load of the DB server, which is a problem in operation. "" As the provisional measure for the high load state, the current number of accesses to the server is restricted Therefore, we informed that there was an error (Service Temporarily Unavailable) when displaying the page ", and on July 31," We are having an event where memory shortage occurred that was part of the problem Measures are taken against the countermeasures ".

Failure also occurred in 2007Failure occurred in posting posts by mobile phone · "Yesterday's access number" and "Today's access number" of the access counter were not reset normally at midnight, and " Events that are cumulatively added to "Today's Accesses" · Machine trouble of the database server has occurred, but the number of maintenance times and total time has been shortened and seems to be getting better.

In this way it seems that it is covered with faults, but this is not just a problem of Doblog, most of the blog services are smaller and less in the first place In 2004 - 2007 It was a fact that it was spent. For that reason, there were times when we emigrated to services with less obstacles, in a sense, becoming a daily catchy sight.

The downtime and load situation of Doblog are summarized on the following page.

Doblog access load check

In addition, the development of Doblog was "Hotlink CompanyIt is "NTT DATA Hotlink etc had sales rightsIt seems.

Although it is unknown what the system name of this blog + SNS is concretely,Mitsukoshi's community site "Mitsukoshi community salon" was built with DoblogSince there are past and there are several other cases, it seems that it was at that time as a reasonable business.

In other words, for NTT DATA it seems that the purpose was not to "advertise revenue of Doblog itself", but to "sell Doblog's system". Given that fact, at the present time NTT DATA's pages on Doblog's introduction contents "Applications and Applications", "Community Generation Tool for Enterprises and Local Governments" "Communication Tools for Corporate and Customers" "Review and Product Review Tools" I can understand why it is written.

It is unknown whether there are many companies that have adopted Doblog yet at the present moment but it is also said that continuing to operate has continued to pay as it is, It may have been done.

What is most interesting is that if the service ends due to this case, what happens to the remaining users ... .... At the very least, I only want someone to keep the data that I've updated on my blog.

in Web Service,   Column, Posted by darkhorse