A high school teacher led the construction of the free data set 'LAION' used by the image generation AI 'Stable Diffusion'



LAION is a non-profit organization that builds training datasets for generative AI, which is also known to be used by the image generation AI `` Stable Diffusion ''. The leader of LAION is Mr.

Christoph Schumann , who works as a high school teacher in Hamburg, Germany.

A High School Teacher's Free Image Database Powers AI Unicorns - Bloomberg
https://www.bloomberg.com/news/features/2023-04-24/a-high-school-teacher-s-free-image-database-powers-ai-unicorns



Christoph Schumann studied computer science and physics at the University of Vienna, studied acting in a workshop for six years, and after graduation worked as an IT administrator and teacher in the city of Hamburg while shooting films for children. I was attending a workshop.



Mr. Schumann's involvement in establishing LAION was triggered by his participation in the Discord server for AI enthusiasts. At that time, OpenAI, an AI development organization, was developing

a diffusion model for image generation called DALL-E, but Mr. Schumann was concerned that major technology companies would occupy data.

Therefore, Mr. Schumann and his colleagues on the Discord server launched the 'Large-scale AI Open Network' project to create an open source dataset useful for learning diffusion models. An image dataset is not just an assortment of images, it requires annotations that describe what is in the image. Schumann and his colleagues used HTML code collected by Common Crawl , a non-profit organization in California, to locate images on the Internet and associate descriptive text.

As a result, Schumann and his colleagues succeeded in collecting a set of 3 million images and text in just a few weeks. Three months later, we were able to release a dataset containing 400 million image-text pairs. At the time of article creation , `` LAION-5B '', which contains more than 5 billion images and texts, has also been released, making it the largest free-to-use dataset. LAION also publishes tools such as image recognition model CLIP and its benchmarks.



Many of the images and links included in the LAION dataset are visual data on Pinterest, Shopify, and Amazon Web Services, YouTube thumbnails, portfolios on the art sharing social site DeviantArt , photos from news sites, and images from the United States. Anything on the Internet, such as images on government websites such as the Department of Defense. Therefore, images and links collected by LAION may contain violent, discriminatory and sexual content.

Before building the LAION dataset, Schumann consulted with a lawyer and ran an automated tool to filter illegal content. In addition, when problematic content is notified, it is said that the content is deleted immediately. However, Schumann said he was more interested in being able to learn from the dataset than filtering it perfectly, saying, ``We could filter violent content from the published data, but We decided not to filter because the violent content in the data would speed up the development of violence detection software.'

In July 2021, LAION became a non-profit organization and Mr. Schumann became its leader. It seems that Mr. Schumann is undertaking contact to LAION, and it is said that paper scribbled with 'LAION' in pencil is pasted on the mailbox of Mr. Schumann's home in the suburbs of Hamburg.



Of course, the creation of the dataset is completely unpaid, and everyone works for free. Therefore, LAION received a one-time donation from Hagging Face, an online repository for AI in 2021.

In addition, Emad Mostak, CEO of Stablity AI, offered to bear the computational cost in Discord's chat. Mr. Mostak wanted to launch an open source generative AI business and wanted to use LAION to learn the AI. Mr. Schumann said, ``I was very skeptical about Mr. Mostark at first,'' and it seems that the LAION team did not take Mr. Mostark's idea seriously, but Stability AI led by Mr. Mostark will be released in August 2022. Released Stable Diffusion trained on the LAION dataset. At the time of writing the article, Stability AI has a valuation of $ 4 billion (about 537 billion yen).

Mr. Schumann himself has not received any compensation from LAION and has stated that he does not intend to do so in the future. ``I'm still a high school teacher.

in Note,   Software, Posted by log1i_yk