Dec 22, 2023 19:00:00

Introducing ``StreamDiffusion'', an extremely fast image generation pipeline that can generate over 100 images per second

`` StreamDiffusion '' has been released, which optimizes the ``pipeline'', a series of processing structures such as data input from sources, data output to machine learning models, and adjustment of learning patterns, for real-time image generation.

[2312.12491] StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

https://arxiv.org/abs/2312.12491

GitHub - cumulo-autumn/StreamDiffusion: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
https://github.com/cumulo-autumn/StreamDiffusion/tree/main

Thank you for waiting! About our paper “StreamDiffusion” published today on arXiv
The GitHub repository has also been published! It is also possible to output more than 100fps!
Please check the README of the paper and repository for details! #StreamDiffusion
Paper: https://t.co/4zQKFyPKgj
GitHub: https://t.co/U1ufvRR9cq https://t.co/5hO1UXT4Ya
— Aki Sensei / Aki (@cumulo_autumn) December 21, 2023

According to the authors, while existing diffusion models are good at generating images from text or image prompts, they sometimes fall short when it comes to real-time interactions. Such limitations are especially noticeable in scenarios involving 'continuous input' such as the metaverse and live video streaming, and they have devised a new approach to address this problem.

When generating an image using StreamDiffusion in an environment of RTX 4090, Core i9-13900K, and Ubuntu 22.04.3 LTS, the SD-turbo model's Text-to-Image processing achieved a value of over 3 digits of 106.16fps. is.

A GIF showing how images are generated from text in real time has also been released. Click on the image below to jump to the public page.

You can also check the image generation process in the video below.

Demo video of 'StreamDiffusion' that generates images in real time - YouTube

StreamDiffusion's features include Stream Batch, which eliminates traditional latency and interaction approaches and improves data processing efficiency through batch processing, RCFG, which minimizes calculation redundancy, and GPU usage efficiency. It is equipped with features such as 'Stochastic Similarity Filter' to maximize it.

Stochastic Similarity Filter 'reduces the load on the GPU by reducing the conversion process when the frame does not change much from the previous frame.' In the GIF animation below, which shows the effectiveness of the Stochastic Similarity Filter, you can see that the GPU usage rate remains low even though images are output at lightning speed.