"Give a new sense to the user" What is necessary for the next-generation character AI?



Mr. Yoichiro Miyake, Technology Promotion Division of Square Enix, talked about the concept and framework for realizing the next generation character AI "give the user a new sense."

Construction of next generation character AI architecture
http://cedec.cesa.or.jp/2012/program/PG/C12_P0159.html

By building a next-generation character AI architecture, Square Enix's Technology Promotion Department, Miyake, will tell you.


This time I will explain the framework for creating character AI for the next generation with very abstract contents. At the Technology Promotion Division of Square Enix where I work, I am the leader of AI who is making the next generation game engine, but after having launched last year, we have over 120 meetings with 3 people, I have pursued what I need for Character AI.

Next-generation character AI is "to participate more deeply in the game world" in 1, "recognize the environment and recognize themselves" 2, 3 "have self-consciousness and feel their own body "4" Make a wide range of decisions from high to low. " By making such character AI, we are doing with the concept of giving users new sensations.


How we design AI, for example, 10 years later, we will make an image of what AI is doing as a story. For example, recognize the direction of the wind and fire the meadow from the windward to attack the enemy. I will carefully pursue how to realize by preparing these things from 100 to 200.



As a part of it, what you need in the concept is something like this on the slide, for example. We will take this into the AI ​​part of the game engine one by one.


This time we will focus on only the main concepts and then explain the framework on that topic.


Since the field called Character AI is a very difficult field to fix the frame, as the purpose of the lecture, first of all let's explain the general theory based on the accumulation of industries in the last ten years. And I hope to share the concepts for creating the overall development of this field on a common foundation.


Content of lecture consists of chapter 1 to chapter 7.


In the first chapter we will talk about the flow of information in intelligence, information flow.


"Intelligence" is a hierarchical structure centered on consciousness.


Here are the vectors trying to be independent from the environment and the vectors trying to blend into the environment on the other. It is two conflicts: "the power to consciousness to control the world" and "the power that the world tries to control humans".


Intelligence exercises in this harmony and conflict.


Next, although there is a hierarchical structure, there are two signals, sensory signal and command signal. The body acquires information from the world through sensation. Intelligence processes information acquired, thinks, and determines actions. This is a central concept.


This concept is sometimes incomplete, and it reflects only on the body ... ...


Or, there is reflection in intellect. For example, saliva comes out.


When making character-AI, it is necessary to first accumulate various reflection levels. In addition to the reflection of the unconscious level from the body, such as "hit back when struck", there is decision making, such as "to form strategy". Intelligence has become a multilayered level of reflection level and decision making.




The hierarchical level is linked to the information level as "physical information" "physical information" "abstract information". It is the same hierarchical structure when descending.


And there is reflection in each hierarchy. Only the top is the decision.


The basis of Character AI is to create such information flow and flow of information in various hierarchies. Please remember this figure because it often appears in this talk.


I will summarize the story so far.

Information enters from the world and goes out. There are various reflection levels at that level, there is further decision making. There are research themes in this various information flow, and it becomes a big frame that various forms can be seen when digging down one by one.


To put it more clearly, information reacts variously while being abstracted from physical information to abstract information. There is a reverse flow when descending further.


The mechanism by which the upper hierarchy controls the lower hierarchy,Subsumption structureIt is also used in Character AI as a mechanism to combine reflection and decision making.


As Tips 1, we first make reflexes and automatic control at the body level. Next, create a reflection system in intelligence. And it is a policy of big design to make a decision making system and finally combine them in a multilayered manner.


Then, what kind of information actually flows in the information flow?

Information handled by intelligence has two aspects, "perception" and "action".


For example, information that a character recognizes watermelon as "green" or "round". On the other hand there are expressions "eat" and "divide" as actions that you can do. This is information on two aspects formed by perceptions and organs where the body can act on the environment.


Preparing two such information is an important knowledge representation in character AI. Speaking of perceptionQualiaOr, if it is an actionAffordanceThe essence is the hint of action, or the nature of things.


The important thing here is that organisms are caught by both perceptions and organs affecting the environment.


This is a feeling that it has insects, etc. In the case of insects, there is only primitive one, but for example, the higher the higher the insects → squirrels → humans, the more advanced sensations and actions are accumulated I will.



In other words, when advancing intelligence, it is important to upgrade three of "sense", "ring world" and "action".


To summarize, first of all I will advance the feeling. For example, it is sense of balance or economic sense. In addition, it is a frame that not only "hit" or "throw" but also to use a combination of such actions against actions.


So how do you make it? I will talk about feeling people from now. As to the advancement of behavior, we will talk about further advancement in thought, body, decision making.


I mentioned that living organisms are treating things with the acting organs and sensory organs, but I call it the ring world.



In other words, while acquiring information by sensory organs, we catch objects by working organs. For example, a tick can feel humidity, find a point to suck blood and chew. It is always accompanied by sense and execution for the target.


This kind of action is a process of learning that babies acquire an information flow that forms the circle of perception and action by throwing and destroying sensed things.


This kind of relationship holds the character for various objects. If it is a goblin, it is "bad fellow" "black" "scary". On the other hand, you can "hit" and "can give damage".


This is also for the whole world, it is dark, it is spooky ... ...


As a complex feeling of these concepts, the environment for the character appears.


In the annual world scheme, it means that we have bidirectional information representation for objects.


Capture the world as a collection of various objects. Since this function is developed, human beings will constitute a rich ring world.


In this figure, if it says which is ring world, it is the sensor that comes in. And, it is the organ that is going to go out, information will come in as it affects the information world. We emit sensory and physical signals with various sensory organs.



With these two organs, there is a subjectively constructed world and an anniversary world, and organisms are a scheme of exchanging information with the world with such a subjective world as an interface.


Like a slide, it is stretched in all directions. For example, we will integrate the body, various sensations and capture the world, trying to exert any action on the world by the working organ. As a world of such interaction between perception and action, a ring world appears.



And various subjects come out.


To constitute such a subjective world is important for Character AI. Because it makes it easy for individual characters to appear in various characters. Whatever the character constructs in the subjective world, if you prepare this beforehand, you can build a certain degree of personality without having to do much thought.


In summary as Tips 2, we create knowledge representation for objects, situations and environments, and make two of perception and action representation. Express the target which I want to treat AI as much as possible, and they make a subjective world of AI. To create an advanced subjective world, it will be to create sophisticated intelligence.


Then, what kind of place is the place of thinking where such knowledge flows down? In Chapter 3 we will take thoughts.


Thoughts can be said to have been deeply deepened, I would like to explore further deeply recognizing themselves and the environment for the next generation, one step further.

In chapter 3, I would like to introduce with the flow of machine consciousness, attention, multi-attention, multi-stage.


Machine · consciousness is a field to study the consciousness of machines. Recently this research is getting exciting, and it is being introduced also in the game field. However, in the case of games, I do not want to enter much philosophical discussion, so consider the consciousness mentioned here almost as an attention.


Consciousness has two categories. P-consciousness is a subjective experience that we experience. A-consciousness is a conscious awareness of my spiritual activity.

It is a model of how to build A-consciousness from artificial intelligence.


First, I will introduce three ideas on A-consciousness.


The first blackboard architecture sometimes caught up as an architecture of MIT's character in 2000, and this decade is the basis of FPS's triple A title character.

The idea is very simple, the blackboard is in the middle, the module surrounded by modules called KS surrounds the information and analyzes it. After analyzing, it is Arbiter that it returns to the black board and adjusts those KS.


Another is Global Workspace Theory.


Workspace is the top memory that is first input to the brain. The target focused on that is written in and the processor performs various analyzes on it.


The Multiple Draft Model is a model that the processor cooperates with. Under the consciousness, various thoughts are emphasized and information is analyzed.


I would like to introduce the idea of ​​multiplying these three models used in 2010. This has become the evolutionary system of GWT, first focus on the focus point that entered into consciousness, the modules below will cooperate and analyze. This structure is often compared to the stage. If working memory is the stage, various events come in there, the most focused attention among them is the actor who is taking a spotlight on the stage. The processor is paying attention to actors in the audience. Then I skip the funeral and give opinions, and I rewrite more and more. For example, when I look at the microphone in front of me, I recall a lot of things that my thinking is "made of metal", "Mike used in the past" In other words, the act of looking at the microphone is not only looking at the microphone in front of it, it is made up of various analyzes.



Regarding the focus point, in the game, I extended this a little more and it is multi attention. If there are 3 bodies in front of you, you can not concentrate thinking on only 1 body. I would like to raise multiple focuses above the stage and analyze those thoughts at the same time. For example, I will differentiate it like what I pay attention to a little, and what I pay attention a little.


Further expansion, I want to parallelize actual thought this time. I want to make the same attention different from a different viewpoint. The sub complements the thought of the main stage while keeping multiple thoughts against the same target, or things that multiple sub-proposals overturn the main stage thought decision. It is a requirement of the next generation to establish cooperativeness of such thinking.


It will look like this as a summary.


Tips 3 looks like this.


"Flow of information" "Knowledge structure" "Place of thinking" was confirmed. So how do they relate to the body? In Chapter 4 I will talk about the flow of the body.


First of all, I would like to review the information expression of the world itself, away from the body. As I mentioned earlier, there are various information expressions in various hierarchies, and various thinking flows.


There are two expressions at various boundaries. Things attracted from the environment, the world seen from consciousness. This dual expression is also in our interior. There is a representation of the world where consciousness has emerged into our consciousness through unconsciousness through senses from the world, the world where our consciousness is visible, from the conscious and unconscious boundary.



Let's introduce the structure of multilayer in the body as well.


Physical expression at the bottom when recognizing the world. The middle is an intermediate expression, there is a simplified expression at the top.


Conversely, it flows from the simplified expression to the physical world. It means that action on the world is composed.




To summarize a bit, it looks like the following.


Let's think about the same thing in the body. There are two expressions on various boundaries also in the body.



In other words, the image that comes from the body "It was my body" and the image that we are going to move the body. For example, we do not know how many bones we have. The body rising to consciousness is shown as a very simple, symbolized expression. That's why you can make a very simple decision. Conversely, through this decision making, complicated expression can be done this time. These functions are also important when making characters, let's think about two expressions at various boundaries. Even here is the flow of abstraction and concreteness. Let's consider the physical layer at the lowest layer, the simplified body expression on the top, and the body expression symbolized at the top.


In summary it will be below.


As a bit easy to understand, it is the bottom bowling animation data as a way of composing consciousness. And in the intermediate representation, for example, it is an animation graph, and in simplified expressions, for example, it means that I am running now. On the other hand (right) it means that it will go down to a concrete expression through simple expressions.


When you take this way of thinking, what is in the abstraction expression of the body (the frame of the orange), there are body sensation, internal body sense, balance sense, Interoception, Proprioception, and so on. Proprioception is now a sense of how my body is moving.


And there are reflections in each hierarchy. For example the procedure layer animation if it is the bottom layer. And conditional reflection in intermediate representation. The body of the body is also such a double expression.


I have explained the information expression of the world and expression of the information of the body so far. The combined model of these two will ultimately become the soil for making decisions.


In the physical world there is a physical world and body expression, the intermediate world has an intermediate world and body expression, the simplified world has a simplified world and body expression. We make decisions in expressions where the body and the world are relativeized in each hierarchy.


And we do reflections at each layer. It will be.


If you summarize chapter 4, it becomes below.


This is Tips 4.


With this I have seen the relationship of "world" "body" "intelligence". So, what is the specific decision-making mechanism?

We have investigated decision making algorithms used in digital games. It can be grouped into seven types. As there is no time to explain about this time,WEB + DB PRESS VOL.68It is summarized in a magazine called.


Information flow in intelligence is not linear but nonlinear. There is a memory where information is stocked there. In chapter 6 we will explain memories.


Memory has been very enthusiastic development in the field of FPS since around 2000. I will introduce a frame that combines the work and the decision making mechanism this time. First of all, there is a stage. The stage is the place we keep information for the time being. There is a focus point of attention there, and the processor analyzes it. The working memory is the point that is taking a focus spotlight. The memory of each moment is stacked in working memory.




This frame is a model adopted by MIT architecture since around 2000. This is a blackboard architecture. From the virtual world to the sensory system via sensors. First of all, I will write information obtained from perception into working memory. The module called action system analyzes it and writes the actual action. It is the navigation system that returns the actual action to the action. In other words, the left side is the processor, the green part on the right side becomes the information body of the black board. Memory storage area that is written most frequently among black boards is called working memory, and perceptual memory is stacked here. It is this architecture to do data mining from the stacked situation and connect the place and find the prediction point.


Just a moment, the memory for the target will be built first. If it is constructed to a certain extent it can be called a flow of memory. For example, when kicking a ball, the position of each moment of the ball is recorded, and consciousness is constructed by calculating the position of the next ball unconsciously to a certain extent. Such information is compressed over time and fixed as knowledge of the subject. These are stored in short term memory. Short-term memory is a stable area.





The place where knowledge is organized for various objects is called a temporary memory stack.


I would like to think about the caller this time.


Knowledge of goblins and watermelons is inside. Various knowledge is in memory.


When goblins appear here, I will recall the memories accumulated in the past. For example, matching is done that there has been a long time ago. In other words, at the moment of seeing the goblin, you can see the goblin as a collection of various memories against the goblins.


Likewise, matching is done at the moment when you see various objects. Matching and writing to the entire memory can be considered equivalent to the processor.


Knowledge on a specific object can be considered as being received by experts. In other words, you can build memories and thoughts within the same consciousness model.


Finally I will tell you the hierarchical structure of memory.


Memory is from short-term memory, long-term memory, fixed memory. When actually dealing with memory, the necessary memory is the process of recall.


Here again the matching process is done. For example, there is pear, and the memory I ate a long time ago is in long-term memory. That memory is recalled is to call it again to use that memory. In terms of time scale, long-term memory is rewritten at high speed compared to fixed memory.


The top is the stage, the bottom is the long term and fixed memory. Memory to work on subjects from working memory, storage called from short term memory, memory called from persistent storage. When the object appears on the stage, by calling up the memory of the past, it is possible to grasp the object as a collection of such information, such as past history, not knowledge of objects in front of you, knowledge Become. So I can say that memories are not static, they are active.


Tips 6 is as follows.


The thing we wanted to do was to create a framework that could load all the concepts we raised for the first time.


As a method, fixed storage, secondary storage, primary storage from the top. I have a stage.




As Tips 7, it is important that memories have activities. Information is compressed while sleeping and written in deep hierarchical memory. However, when you are awake, it will be recalled to the target in front of you. In other words, from here it is called through various KSs, at the same time information is being written and read. Or that the information was compressed in various memory hierarchies without climbing consciousness. This is to see in a dream.


In summary, when making two AIs, two pieces of information, a human being as a physical entity, a human being as an information body, a character, a consciousness composed of the body and a consciousness composed of a memory body It is the frame we introduced this time to make a place in the overlapping part and contrast the two actions there. In the future we will drop it to design and implement it.


According to Mr. Miyake, Square Enix is ​​seeking game AI engineers, researchers and so on. Job information is posted on the following site.

Agni's Philosophy FINAL FANTASY REALTIME TECH DEMO
http://www.agnisphilosophy.com/jp/index.html

in Coverage,   Game, Posted by logc_nt