Q & A site / Stack Overflow archive has stopped updating, dark clouds in the way it should be as an open knowledge base



Stack Exchange Data Dump, an archive of past posts, has been disabled on Stack Overflow, a Q&A site where many volunteer moderators are on strike due to a conflict with the operating company over the handling of posts by AI. I understand that it was done. Striking users, however, have raised concerns that it will affect the raison d'etre of the platform.

June 2023 Data Dump is missing - Meta Stack Exchange
https://meta.stackexchange.com/questions/389922/june-2023-data-dump-is-missing/390023

Since 2009, Stack Overflow has created Stack Exchange Data Dump, which summarizes questions and answers as a database, and has been updated once every three months. However, I can't find the data dump for June 2023 in the Internet Archive, and the post pointing out that the update has been discontinued since the last March is the site Meta Stack, which deals with topics such as the operation of Stack Overflow. Posted in Exchange.

And a former database administrator who was recently laid off from Stack Exchange, the company behind Stack Overflow, said in response to the post, ``The job to upload data dumps to Archive.org was disabled on March 28, It is marked so that it will not be reactivated without senior leader approval,' he said, revealing that the data dump was no longer updated due to the company's decision.



Some Stack Overflow users linked the stoppage of data dump uploads to the strike, but the March 28 date is before the strike started, so at least it's not directly related. being seen

In a statement quoted by a respondent, Stack Overflow Chief Technology Officer Jody Bailey said, 'We're looking at ways to gate access to dumps, APIs, and

Stack Exchange Data Explorer (SEDE). We are looking for something that allows individuals to access their data while preventing abuse by organizations looking to make money from the work of our community.' The organization Bailey mentioned is believed to be an AI company that trains large-scale language models with data gathered from the internet.

According to the respondent, SEDE is updated every weekend, and the data dump uploaded to the Internet Archive is a dump of the SEDE database. Stack Overflow data is still available as the data dump and SEDE content are not the same but overlap.



However, there are also voices of concern about the suspension of updating data dumps that have been widely published under Creative Commons licenses. In a post reporting on the progress of the strike, Stack Overflow user Nick, who also signed the letter announcing the strike, said, 'Data dumps will be done without prior notice or warning until users ask. Disabling data dumps in this way is an example of a lack of communication with the community.'

Nick continued, ``More importantly, the data dump emphasizes the raison d'etre of this platform,' guaranteeing access to a treasure trove of knowledge and providing it for free. Established to ensure that information flows freely in place of paid platforms, Datadump 'will always have free access to shared information for everyone, no matter what the future holds for the company.' It was also an insurance. Disabling this would betray Stack Overflow's founding philosophy.'

In addition, Nick was asked by Stack Exchange to select three moderators to represent the strike, and that votes are being held for the selection, and that Stack Exchange has told the media that the strike will be held. 11% of the moderators who participated in the event,' but in fact many moderators' work is delayed, and the data released by Stack Exchange regarding the accuracy of moderators' judgment of GPT content is questionable. .

in Web Service, Posted by log1l_ks