Microsoft develops 'Visual ChatGPT' that can generate images in chat format with image generation function installed in ChatGPT



The interactive AI '

ChatGPT ' provided by OpenAI is used in courts and writing sentences due to its extremely high performance. However, ChatGPT is an AI developed for conversation and does not have an image generation function. Meanwhile, a new research team led by Chen Fei Wu of Microsoft Research Asia has announced ' Visual ChatGPT ', which is equipped with an image generation function in ChatGPT.

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
(PDF file) https://arxiv.org/pdf/2303.04671.pdf



GitHub - microsoft/visual-chatgpt: Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

https://github.com/microsoft/visual-chatgpt

Microsoft Research Introduces Visual ChatGPT That Incorporates Different Visual Foundation Models Enabling Users To Interact With ChatGPT - MarkTechPost
https://www.marktechpost.com/2023/03/10/microsoft-research-introduces-visual-chatgpt-that-incorporates-different-visual-foundation-models-enabling-users-to-interact-with-chatgpt/

Image generation AI such as Stable Diffusion allows you to generate your favorite image by entering sentences and reference images as prompts . However, in order to make full use of image generation AI, it is necessary to appropriately set various elements such as 'model data', 'resolution', and 'number of sampling', as well as to perform troublesome operations such as constructing complex prompts. I have.



Therefore, the research team of Mr. Wu et al. has developed an interactive AI called `` Visual ChatGPT '' based on the conventional ChatGPT. Visual ChatGPT can generate images by interacting with input text and prompts.

Wu et al.'s research team added VFM such as Stable Diffusion and

InstructPix2Pix to ChatGPT. Furthermore, in order to fill the gap between ChatGPT and VFM functions, ``specify the input / output format and notify ChatGPT about each VFM function'' and ``image processing based on the usage history and priority of various VFMs. It introduces a prompt manager such as 'What to do' and 'Supporting ChatGPT processing by converting various visual information such as png images and depth images into language format'.

The architecture overview of Visual ChatGPT is shown in the image below. In Q2, the sofa image shown in Q1 is asked, 'Replace the sofa in the image with a desk and make it more watercolor-like.' Upon receiving a query from a user, it creates a prompt to use a selection of tools from among various VFMs, including a description of ChatGPT's system and interaction history, and enters it on ChatGPT.



The demo of Visual ChatGPT looks like this. When you type 'Could you generate a cat for me?' into Visual ChatGPT, Visual ChatGPT will instantly generate a cat image.



Furthermore, if you enter 'could you replace the cat to a dog and then remove a book?' image is generated.



Also, if you ask 'That's cool! Could you generate the

canny edge of this image?', the output is an edge-detected dog image.



Next, if you enter 'Please generate a yellow dog based on the edge-detected dog image', the yellow dog image will be generated as requested.



By using tools like Visual ChatGPT, it is said that it is possible to reduce the barriers in generating images from text and add interoperability to various AI tools.

Wu et al.'s research team said, ``VFM failures and prompt irregularities may not lead to satisfactory generation results, which is a concern.'' We need a single self-modifying module that matches the intent of .In addition, the introduction of this module may increase generation time, so we will continue to investigate.'

The source code of Visual ChatGPT developed by Mr. Wu et al. is published on GitHub. In addition,

ChatGPT API is required to use Visual ChatGPT.

GitHub - microsoft/visual-chatgpt: Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
https://github.com/microsoft/visual-chatgpt

in Software, Posted by log1r_ut