The history of 'knowledge graphs' that are the basis of artificial intelligence and machine learning



Knowledge graphs , which are essential for internet searches and machine learning, use a graph structure to link various pieces of knowledge and link data to perform knowledge exploration and advanced analysis. Communications of the ACM , an academic journal in the field of information, explains the history of knowledge graphs, which are the basis of artificial intelligence and machine learning.

Knowledge Graphs – Communications of the ACM
https://cacm.acm.org/research/knowledge-graphs/



The concept of knowledge graphs arose from scientific advances in a variety of research fields, including the semantic web, databases, natural language processing, and machine learning. According to Communications of the ACM (hereafter referred to as ACM), knowledge graphs are important for deepening understanding of ideas and technologies in a variety of fields, but they realized that many people do not understand the history and concepts behind knowledge graphs, so they felt the need to explain them.

The core element of the knowledge graph concept is the idea of 'representing knowledge in a diagram'. The image below is a conceptual diagram of a knowledge graph, where words are associated with each other by their meaning, for example, the words 'dog' and 'cow' are associated with the word 'animal' by 'is', and 'cow' and 'grass' are associated with the word 'eat'. The idea itself dates back to a visual form of reasoning proposed by Aristotle around 350 BC. Research on diagrammatic reasoning by scientists deepened around the 19th century, with famous figures such as Charles Sanders Peirce , known as the founder of pragmatism, and Gottlob Frege, the father of analytical philosophy. The idea of representing knowledge in a diagram is relevant to a variety of fields, including mathematics, philosophy, linguistics, library science, and psychology.


by

Jayarathina

To talk about the history of knowledge graphs, ACM first mentions the 'advent of the digital age.' Between 1955 and 1956, scientists developed ' Logic Theorist, ' the 'world's first artificial intelligence program,' and in 1957 created a general-purpose problem-solving program called ' General Problem Solver .' After that, until the 1970s, 25% of the time that computers were running was used to 'sort data to make any search procedure possible,' and attempts were made to 'search for knowledge and inference from large spaces.' One famous example is the Dijkstra algorithm, which is used to solve the shortest path problem from a starting point. The image below is a conceptual diagram of the Dijkstra algorithm.



Another important element that developed during this period was the method of retrieving information from unstructured text. In 1964, American computer scientist Bertram Raphael developed a computer program for semantic information retrieval, which established a system for extracting semantic content from text by formatting conversational text. Other ideas that developed significantly during the 1950s and 1960s included systems for managing data needed for inference and structuring, and memory for storing large amounts of knowledge.

In the 1970s, computers were widely adopted in industry, leading to the founding of companies such as Apple and Microsoft. In addition, developments such as the birth of data processing systems and increased storage and processing power created an urgent need to establish a way to manage large amounts of data. Thanks to contributions from Edgar F. Codd , who invented the relational model of database management, and Peter Chen, who developed a model that allows the description of conceptual data models, the ' relational database management system ' was developed and implemented, in which data is modeled like a table using the concept of ' relation ' in order to search and change data by giving queries to the database.

The 1980s saw the boom of personal computers, and the evolution of computing from industry to the home. As computing power improved and multiple ways of processing complex data emerged, the data that was generated was complex and needed to be managed, and so did the study of 'graphs,' which focuses on structures composed of collections of objects. David Harrell 's ' High Graphs ,' developed in 1988, formalize relationships into visual structures, and are still used today in industrial applications and diagrammatic studies of philosophical reasoning.



The term 'knowledge graph' first appeared in 1987. The idea presented in the paper 'Knowledge Graphs: Representation and Structuring of Scientific Knowledge' by Rene Ronald Bakker, a researcher at HAN University of Applied Sciences in the Netherlands, was studied in depth in the 1990s and became widely used in the 2000s.

The 1990s saw two major world-changing computer-related events. The first was the emergence of the World Wide Web, the idea of an information space where anyone could post and read information, completely changing all the theories, philosophies, and practices of knowledge and data management. The second was the digitalization of many aspects of society, where data was moved from being stored on paper to being stored on computers, marking the beginning of modern big data, with both research and industry simultaneously addressing it as a new area of development.

The 2000s was a time of explosive growth in e-commerce and social media, and major advances in hardware and software made it possible to generate, store, process, manage, and analyze data on a much larger scale. It was also a time when deep learning in AI began to be introduced and statistical methods emerged. Companies such as Google and Amazon also introduced infrastructure to process large amounts of data, pushing the barriers of data management to handle a variety of formats, including text, sound, images, and video. While the 2000s saw the advancement and success of statistical methods for large-scale data processing, the world of big data called for new formats to store, manage, and integrate data and knowledge. The ACM points out that this was the driving force behind the emergence of the concept of knowledge graphs.

Deeply related to knowledge graphs, the paper 'Semantic Web' published in 2001 by Tim Berners-Lee and others who developed the World Wide Web had a major impact on industry and academia. The Semantic Web is a concept that allows the 'meaning' of web pages to be added using XML , rather than the traditional HTML format that conveys 'document structure.' The paper gave rise to various frameworks and protocols, and in 2006 Lee coined the term ' Linked Data ' to emphasize the network structure of data on the web to enhance knowledge. The Linked Data project gave birth to large-scale graph-based knowledge bases and ultimately influenced the format of major search engines.

In 2012, Google released a product called ' Google Knowledge Graph '. Google Knowledge Graph is a database that can link data and expand search results by semantic search information including 'meaning', and was added to Google's search engine. After that, countless companies and organizations began to use the keyword Knowledge Graph to refer to the integration of data, and in academia, the Semantic Web was reborn as a system to graph data structures, and Knowledge Graph became common. When you search on Google, information about the content you searched for may be displayed in the upper right corner, as shown in the red frame in the image below. This information is taken from Google's Knowledge Graph and is called a ' Knowledge Panel '.



ACM said, 'History reminds us that there are no absolute successes or failures, and that each idea, theory, or technique needs the right circumstances to reach its full potential. The concept of knowledge graphs has existed and research has been developing since ancient times, but it was only completed when the right technology was available at the time when the need arose. The future is difficult to predict, and although statistical and logical methods are merging today, it is impossible to know which way it will go. Past ideas and developments that were unsuccessful or little known at the time certainly contain useful ideas that will inspire and guide future research.'

According to ACM, the details of the historical roots behind the concept of knowledge graphs are unknown. 'We hope that by looking back at the history that can be identified, we will contribute to the research on the roots of knowledge graphs,' ACM said. It should also be noted that ACM's description does not necessarily cover all aspects of the phenomenon, as it is not a survey study or a quantitative analysis of papers.

in Software,   Web Service,   Science, Posted by log1e_dh