Dec 21, 2022 12:39:00

3D model generation AI ``Point-E'' is open sourced by OpenAI and anyone can download it, 3D objects can be generated and displayed from prompts 600 times faster than before

OpenAI, a research group that develops image generation AI '

DALL-E 2 ', text generation AI ' GPT-3 ', and dialogue AI ' ChatGPT ', has open sourced AI ' Point-E ' that generates 3D objects from text. , and published the source code on GitHub.

GitHub - openai/point-e: Point cloud diffusion for 3D model synthesis
https://github.com/openai/point-e

[2212.08751] Point-E: A System for Generating 3D Point Clouds from Complex Prompts
https://arxiv.org/abs/2212.08751

Point-E is mainly composed of two models: a model that generates images from text and a model that generates point cloud data from images. A model that generates images from text is a model that understands the association between text and visual concepts in a dataset of labeled images, similar to image generation AI such as Stable Diffusion and DALL-E 2 developed by OpenAI.

And the 'model for generating point cloud data from images' is a model that learns effective transformations between 3D objects and image data sets. Below is the point cloud data (right) actually generated from the image (left).

The 3D object generated by Point-E is not a combination of

polygons , but a 'discrete set of data points in space', that is, it is generated with point cloud data that looks like particles are tightly packed. Therefore, Point-E also includes 'a model that converts point cloud data to a polygon mesh '.

And Point-E also includes an AI that converts point clouds to polygon meshes. Since the point cloud model can be generated faster than the polygon mesh, the generation of 3D objects is more than 600 times faster than before. The big advantage of Point-E is that calculations that used to take hours on

NVIDIA V100 can now be done in just 1-2 minutes.

OpenAI just dropped a prototype of 3D DALLE (called “Point-E”) ????. It isn't as good as Google's DreamFusion, but blazing fast! Like ~600x faster to generate ????. 2D DALLE has already turned the creative world upside down. How will 3D DALLE disrupt games, VR, metaverse, …? ???? pic.twitter.com/Vlibnav9Jc
— Jim Fan (@DrJimFan) December 20, 2022

However, there is a disadvantage that it is not possible to capture the detailed shape and texture of the object, and part of the shape may become blocky or distorted. The following is an example of actually failing to generate point cloud data from an image.

The part of Point-E that generates point cloud data from text can be experienced on Hugging Face, an online AI platform. However, at the time of writing the article, the waiting time for processing is long, so it takes a long time to generate the keywords prepared in the demo.

Point E - a Hugging Face Space by openai
https://huggingface.co/spaces/openai/point-e

You can actually see the 3D object generated by entering 'a red motorcycle' in the demo in the tweet below.

OpenAI released Point-E, a text-to-3D (point clouds) demo ????

You can check out an open-source demo for it at https://t.co/ophW1YtCT3 ????Enjoy! The demo uses the lower-quality but much faster version of the model.pic.twitter.com/sJtlo9ASr6
— Omar Sanseviero (@osanseviero) December 20, 2022