Apple announces PICO, an AI image compression codec that reduces image size by up to one-third while maintaining the same image quality.



Apple has announced PICO, an image compression codec that uses machine learning. Compared to AV1, AV2, VVC, ECM, and JPEG-AI, PICO can produce images of the same quality with up to one-third the data size. Furthermore, it is said to achieve a 20% to 40% reduction in bitrate compared to existing leading machine learning codecs.

What Matters in Practical Learned Image Compression

https://apple.github.io/ml-pico/

An image codec is a mechanism for saving image data such as photographs and illustrations in a smaller size, or for restoring them to a form close to their original appearance. Representative image formats include JPEG and PNG, while HEIC is used on smartphones. In recent years, still image compression using AV1 and VVC has also emerged, as well as JPEG AI, a learning-based image encoding standard.

PICO stands for 'Perceptual Image Codec,' which translates to 'perceptual image codec' in Japanese. Instead of relying solely on traditional, manually designed conversion processes, it is a 'learning codec' that uses a neural network to learn image compression and restoration. Apple's research team explains that PICO is the first learning codec that is directly optimized for human vision and is also practical.

The following are comparison images of PICO, which are available on the project page . With the average bit depth per pixel (bpp) of PICO fixed at 0.341, you can compare it with codecs such as HiFiC, DCVC-RT, VVC, and BPG using a slider.



Traditional image compression methods have often emphasized how similar the pixels of the original image and the reconstructed image are. However, an image that humans perceive as 'beautiful' does not necessarily match an image that is pixel-wise close to the original. PICO trains its model by combining not only pixel matching but also loss functions that evaluate perceptual quality, GAN-based loss functions, and loss functions that suppress the breakdown of small characters and tile boundaries.

The 'GAN-based loss' used in PICO's training is a mechanism that learns to make reconstructed images look more realistic. Because compressed images cannot perfectly preserve details, using GANs makes fine textures like hair and fabric appear more natural, but there is also a risk of creating patterns that did not exist in the original image. Apple's paper describes special measures to mitigate issues such as illegible text and tile-like color unevenness.

Regarding processing speed, Apple reports that it can encode a 12-megapixel image on the iPhone 17 Pro Max in as little as 230 milliseconds and decode it in 150 milliseconds. While PICO's processing time is still longer than that of conventional codecs like HEIC, which are widely optimized for devices, Apple explains that PICO is faster than many high-performance learning codecs when running on a V100 GPU.

The image below shows a comparison table of PICO and various codecs. It summarizes the perceived BD rate, encoding and decoding times for a 12-megapixel image, and practical aspects such as rate control and inter-device compatibility, all based on PICO. The BD rate is an average indicator of the bitrate difference required to achieve the same quality; the '27%' and '169%' in the image indicate that a higher bitrate was required than with PICO.



In terms of practicality, important features include 'rate control,' which allows for precise adjustment of file size and image quality levels, and 'inter-device compatibility,' which ensures that encoded images can be correctly decoded on different devices or implementations. In learning-type codecs, subtle differences in floating-point arithmetic can lead to decryption failures, and PICO is designed to ensure that some processes work decisively.

Apple explains that its evaluation method involved using the CLIC 2020 Test, Kodak, and DIV2K datasets, collecting a total of 74,925 paired comparisons from 610 evaluators. Evaluators compared a reference image with two reconstructed images and chose which one they preferred. Human preferences were converted into Bayesian Elo scores, and the perceived quality of each codec was compared.

It should be noted that PICO is not a panacea. Apple's paper explains that 'PICO is optimized for the perceptual quality of natural images, and for very simple composite images like cartoons, it may use a higher bitrate than conventional codecs to achieve the same quality.'

Apple states that for PICO, they explored millions of model configurations to simultaneously optimize perceived quality and on-device processing time. The research team explains that PICO is an image codec that offers a significantly improved balance of compression, visual quality, and usability compared to conventional codecs and existing machine learning codecs.

in AI,   Software, Posted by log1d_ts