I tried using AI 'Shap-E' to create 3D models from text and images with Google Colaboratory

OpenAI, which develops ChatGPT for chat AI and Whisper for speech recognition AI, announced 3D model creation AI 'Shap-E' in May 2023. Shap-E is developed as an open source and can be used by anyone, so I actually tried using it on Google Colaboratory.
shap-e/sample_text_to_3d.ipynb at main openai/shap-e GitHub
The following article details what you can do with Shap-E.
OpenAI announces open source AI ``Shap-E'' that generates 3D models from text and images - GIGAZINE

First, access Google Drive and click the '+' mark on the right end.

Enter 'Colaboratory' in the search field and click the displayed Colaboratory app.

Click 'Install'.

You will be asked for permission, so click 'Continue'.
Select an account to install Colaboratory.

The installation is now complete. Click 'Finish'.

Click New on the left side of the Google Drive screen.

Since 'Google Colaboratory' has been added to 'Other', click it.

When Colaboratory opens, first change the setting to use GPU. Click 'Change runtime type' in the 'Runtime' menu.

Set 'Hardware Acceleration' to 'GPU' and click 'Save'.

Enter the Python code here. First of all, we will import Shap-E data, so the code to enter is as follows.
[code]!git clone https://github.com/openai/shap-e[/code]
In Colaboratory, you can execute the code by entering the code in the right frame and clicking the play mark on the left.

When the execution is completed, the log is displayed below the code.

When entering a new code, add a code block with the '+ code' button above.

So, install the necessary libraries with the following code.
[code] %cd shap -e
!pip install -e .[/code]
Load the necessary functions from the library with the code below.
[code] import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget[/code]
Use the code below to configure the GPU.
[code]device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')[/code]
Load the AI model that will be used to generate the 3D model.
[code]xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion')) [/code]
Loading was completed in about 2 minutes.

And generate the 3D model with the code below. 'batch_size' is the number of 3D models to generate and 'guidance_scale' is the fidelity to the prompt. You can specify what kind of 3D model to generate with 'prompt'. Since I will output a shark this time, I entered 'a shark'.
[code] batch_size = 1
guidance_scale = 15.0
prompt='a shark'
latents = sample_latents(
batch_size=batch_size,
model=model,
diffusion=diffusion,
guidance_scale=guidance_scale,
model_kwargs=dict(texts=[prompt] * batch_size),
progress=True,
clip_denoised=True,
use_fp16=True,
use_karras=True,
karras_steps=64,
sigma_min=1e-3,
sigma_max=160,
s_churn=0,
)[/code]
With this setting, the 3D model generation was completed in 23 seconds.

Enter the code below to display the generated 3D model as a rotating gif image.
[code]render_mode = 'nerf' # you can change this to 'stf'
size = 64 # this is the size of the renders, higher values take longer to render.
cameras = create_pan_cameras(size, device)
for i, latent in enumerate(latents):
images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode)
display(gif_widget(images))[/code]
A shark like this was generated.

Use the code below to save the generated 3D model.
[code]from shap_e.util.notebooks import decode_latent_mesh
for i, latent in enumerate(latents):
t = decode_latent_mesh(xm, latent).tri_mesh()
with open(f'example_mesh_{i}.ply', 'wb') as f:
t.write_ply(f)
with open(f'example_mesh_{i}.obj', 'w') as f:
t.write_obj(f)[/code]
When you run the code, an obj file and a ply file are generated with the name 'example_mesh_0' in the file column.

Right click and click 'Download'.

After that, it is OK if you import the downloaded file into the 3D model editing software. This time, I performed the procedure for creating a 3D model from text, but an example of creating a 3D model from an image is also included in the Shap-E repository, so please check it out if you are interested.
Related Posts: