How to use ``CLIP interrogator'' that can decompose and display what kind of prompt / spell was from the image automatically generated by the image generation AI ``Stable Diffusion''



The

AUTOMATIC 1111 version is a tool that allows you to easily introduce the image generation AI `` Stable Diffusion '', which was released to the public in August 2022, into the Windows local environment and operate it from the user interface (UI) displayed on the browser instead of the command line. Stable Diffusion web UI . The AUTOMATIC 1111 version of Stable Diffusion web UI not only generates images, but also checks multiple combinations of character strings (prompts) entered when generating images at once, sets multiple image generation conditions, and generates images at once. As a Stable Diffusion UI, it has enough functions to be said to be the definitive version. Such AUTOMATIC 1111 version Stable Diffusion web UI is equipped with ' CLIP interrogator ' that analyzes images generated by AI and displays prompts. Even if someone saw an image generated by AI on an Internet bulletin board or SNS, it seemed to be very useful when it was not disclosed at all what kind of prompt it was made, so I actually tried using it.

GitHub - AUTOMATIC1111/stable-diffusion-webui-feature-showcase: Feature showcase for stable-diffusion-webui
https://github.com/AUTOMATIC1111/stable-diffusion-webui-feature-showcase#clip-interrogator

You can understand how to install and update the AUTOMATIC 1111 version of Stable Diffusion web UI in a local environment or Google Colaboratory (Google Colab) by reading the following article.

Image generation AI ``Stable Diffusion'' works even with 4 GB GPU & various functions such as learning your own pattern can be easily operated on Google Colabo or Windows Definitive edition ``Stable Diffusion web UI (AUTOMATIC 1111 version)'' installation method summary - GIGAZINE



The basic usage of AUTOMATIC1111 version Stable Diffusion web UI is summarized in the following article.

Basic usage of ``Stable Diffusion web UI (AUTOMATIC 1111 version)'' that can easily use ``GFPGAN'' that can clean the face that tends to collapse with image generation AI ``Stable Diffusion''-GIGAZINE



The AUTOMATIC1111 version of Stable Diffusion web UI can not only generate images with Stable Diffusion, but also has many useful functions for generating images. For example, by using the Prompt matrix and X/Y prot that can be used with the Script function of the AUTOMATIC 1111 version of Stable Diffusion web UI, it is possible to generate an image so that you can see at a glance the difference caused by changing prompts and parameters. You can understand the specific usage by reading the following article.

How to use 'Prompt matrix' and 'X/Y plot' in 'Stable Diffusion web UI (AUTOMATIC 1111 version)' Summary -GIGAZINE



I started AUTOMATIC1111 version Stable Diffusion web UI.



To check the image prompt with CLIP interrogator, use 'img2img (image to image)' which generates a new image from the loaded image. Click the 'img2img' tab.



Click the 'Image for img2img' column in the left column and select the image you want to check the prompt for, or drag and drop the image directly into the 'Image for img2img' column.



When the image is loaded, the image will be displayed in the 'Image for img2img' column, so click the 'Interrogate' button.



Wait for a few minutes as the required model will be downloaded for the first time.



Then, in the input field at the top of the AUTOMATIC 1111 version Stable Diffusion web UI, 'a woman with purple hair and a necklace on her neck and a necklace on her neck, with a necklace on her neck, by Ilya Kuvshinov wearing a necklace around her neck, wearing a necklace around her neck, Ilya Kuvshinov style) ”was prompted. For some reason it was appealed that she was wearing a necklace around her neck, but the original generation prompt was `` girl with short purple hair, instagram photo, kodak portra, by wlop, ilya kuvshinov, Krenz Cushart, pixiv, zblush sculpt (purple 's short hair girl, Instagram photo, Kodak PORTRA, WLOP style, Ilya Kuvshinov style, Krenz Cushart style, Pixiv, Zblush works)', pointing out that it is purple hair, as well as the painting style of Ilya Kuvshinov style It was ready. The CLIP interrogator consists of two parts: a 'BLIP model' that generates prompts from images and a 'CLIP model' that selects words from a list prepared in advance.



However, CLIP interrogator requires 12GB or more of VRAM, so it cannot be used with a low-spec GPU that does not have enough VRAM. Google Colab says that even the free version can use a GPU with at least 12 GB of VRAM, so I tried using Google Colab to see if CLIP interrogator could be used.

Install AUTOMATIC1111 version Stable Diffusion web UI on Google Colab . The allocated GPU is a Tesla T4 and the VRAM is 16GB GDDR6.



After installation, access the generated URL and open the AUTOMATIC1111 version Stable Diffusion web UI.



The operation is the same as the local version. Click the 'img2img' tab, load the image you want to analyze in the 'Image to img2img' column, and click 'Interrogate'.



Then, a prompt was generated from the image, and it was confirmed that CLIP interrogator can be used with AUTOMATIC 1111 version Stable Diffusion web UI running on Google Colab.



Settings related to CLIP interrogator can be changed from the 'Settings' tab. The part surrounded by the red frame below is the part related to CLIP interrogator.



Each item is as follows.

・Interrogate: keep models in VRAM: If checked, the model data of CLIP Interrogator will not be released to VRAM even after prompt analysis. Since it is not necessary to read each time, you can expect speedup by that amount. For users with a lot of VRAM.
• Interrogate: use artists from artists.csv: If checked, uses the default list 'artists.csv' for CLIP models. In addition, it is also possible to create an 'interrogate' folder in the same directory as the AUTOMATIC 1111 version Stable Diffusion web UI body and put your own text file that summarizes the prompts and explanations. An example text file is published here .
• Interrogate: num_beams for BLIP: The level of detail to delineate in the first part of the prompt generated by the BLIP model. The higher the number, the finer the first part.
・Interrogate: minimum description length: The minimum length of prompts generated by the BLIP model.
Interrogate: maximum description length: maximum length of prompts generated by the BLIP model

·to be continued

Simple usage summary of `` img2img '' that can automatically generate images with composition and color similar to the original image with image generation AI `` Stable Diffusion web UI (AUTOMATIC 1111 version) '' and change only the specified part - GIGAZINE

Related Posts:

in Review,   Software,   Web Service,   Web Application,   Art, Posted by log1i_yk