Nov 02, 2024 20:13:00

Introducing the multimodal image generation AI 'OmniGen,' a single model capable of 'pose-specified generation,' 'object replacement in an image,' 'subject-specified generation,' and more

This article, originally posted in Japanese on 20:13 Nov 02, 2024, may contains some machine-translated parts.
If you would like to suggest a corrected translation, please click here.

Stable Diffusion, a widely used image generation AI, can perform a variety of tasks using extended functions such as 'combining ControlNet to perform pose extraction and pose-specified generation' and 'combining IP-Adapters to generate similar images.' ' OmniGen ' is a multimodal image generation AI developed with the aim of enabling a variety of generation tasks to be performed with only a single model without using extended functions such as ControlNet, and can perform tasks such as 'image generation,' 'pose extraction,' 'pose-specified generation,' 'object replacement in an image,' and 'subject-specified generation' by itself.

[2409.11340] OmniGen: Unified Image Generation

https://arxiv.org/abs/2409.11340

GitHub - VectorSpaceLab/OmniGen: OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
https://github.com/VectorSpaceLab/OmniGen

Below is a diagram showing the functions of OmniGen. Like general image generation AI, it is possible to generate images from text, as well as to 'change the drink in the image,' 'extract the pose of the person in the image,' 'generate an image by specifying a pose,' and 'generate an image of a pair of people by specifying one subject from two images.'

Here's an example: An image of a woman sitting on a chair is input, and an image of the same woman waving in a crowd is generated.

It is also possible to generate a different image by extracting the 'man in the red shirt' from an image containing three men. The feature is that you can specify the subject by prompting 'man in the red shirt' instead of 'man on the left'.

You can also generate separate images by selecting subjects one by one from separate images. The example below includes a subjective instruction for 'older woman,' but it works correctly.

OmniGen can handle more than just images with people in them. In the following example, it executes the command 'Put the flowers in image 1 into the lightest vase in image 2 and place it on a metal table inside a factory.'

You can actually use OmniGen at the following links:

OmniGen - a Hugging Face Space by Shitao
https://huggingface.co/spaces/Shitao/OmniGen

In addition, the code required to run OmniGen is available at the following link.

GitHub - VectorSpaceLab/OmniGen: OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
https://github.com/VectorSpaceLab/OmniGen?tab=readme-ov-file

Related Posts:

Nov 02, 2024 20:13:00 in Software, Posted by log1o_hf

Archives

Categories: Note; Headline; Review; Coverage; Interview; Tasting; Mobile; Software; Web Service; Web Application; Hardware; Vehicle; Science; Creature; Video; Movie; Manga; Anime; Game; Design; Art; Food; Security; Notice; Pick Up; Column

Search

<	4, 2025					>
Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3

<	4, 2025					>
Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3

<	4, 2025					>
Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3

<	4, 2025					>
Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3