Introducing ``3D-GPT'', a framework that realizes procedural modeling of 3D content using large-scale language models



A research team from Australia and China has developed a framework that combines large-scale language models with Blender , a 2D animation and 3DCG production tool, to create highly accurate 3D content by appropriately interpreting text input by humans in natural language. 3D-GPT ” was announced.

[2310.12945] 3D-GPT: Procedural 3D Modeling with Large Language Models
https://arxiv.org/abs/2310.12945
05

3D-GPT: 3D MODELING WITH LARGE LANGUAGE MODELS
https://chuny1.github.io/3DGPT/3dgpt.html

3D-GPT generates 3D worlds in Blender
https://the-decoder.com/3d-gpt-generates-3d-worlds-in-blender/

Procedural modeling , which generates 3D models and textures based on a set of basic rules, has become a promising option in the pursuit of efficient content creation. However, procedural modeling requires an understanding of rules, algorithms, and parameters, and the problem is that procedural modeling is a heavy burden for human creators.

Therefore, the research team developed a framework called ``3D-GPT'' that uses large-scale language models for instruction-driven 3D modeling. In 3D-GPT, a large-scale language model takes on the role of a ``skilled problem solver,'' dividing the tasks required for 3D modeling into manageable segments, each of which is then executed by the appropriate agent.

3D-GPT mainly consists of three agents: ``task dispatch agent,'' ``conceptualization agent,'' and ``modeling agent.'' The task dispatch agent receives prompts entered by humans, instructs the functions necessary for subsequent processing, and facilitates cooperation between the remaining two agents. The conceptualization agent performs inferences to supplement descriptions not included in the human prompt but necessary for 3D content generation, and the modeling agent performs processes such as generating Python code to call Blender's API.

The researchers explained that by working together, these agents can systematically enhance the description of a scene input by a human and dynamically adapt the text based on subsequent human instructions. .

Below is a video that combines the text actually input into 3D-GPT and the generated 3D content. You can see that the scenes are generated in 3D with fairly high accuracy in both cases.

“The desert, an endless sea of shifting sands, stretched to the horizon, its ripping dunes catching the golden rays of the setting sun, creating an ever-changing landscape of shadows and light. This is a video generated from the text 'The undulating dunes catch the golden light of the setting sun, creating an ever-changing landscape of shadow and light.'

Introducing ``3D-GPT'', a framework that realizes procedural modeling of 3D content using large-scale language models - YouTube


“The lake, serene and glassy, mirrored the cloudless sky above, reflecting the surrounding mountains and graceful flight of a heron, as lily pads floated like emerald jewels upon its tranquil surface. This is a video generated from the text 'It shows the sky, the surrounding mountains, and a heron gracefully flying. On the calm surface of the lake, a lily of the valley floats like an emerald jewel.'

Introducing ``3D-GPT'', a framework that realizes procedural modeling of 3D content using large-scale language models - YouTube


The research team said, 'Our empirical investigation confirms that 3D-GPT not only interprets and executes instructions and delivers reliable results, but also collaborates effectively with human designers. .Furthermore, 3D-GPT integrates seamlessly with Blender, expanding the possibilities of manipulation. Our research highlights the potential of large-scale language models in 3D modeling and supports future advances in scene and animation generation. 'It provides a basic framework for the future.'

in Software,   Science,   Video, Posted by log1h_ik