May 27, 2019 23:00:00

How does the widely used video compression codec 'H.264/MPEG-4 AVC' dramatically compress huge movies?

H.264/MPEG-4 AVC/MPEG-4 AVC, one of the video compression standards MPEG-4 , is a video compression codec that aims for 'high image quality even at low bit rates'. Since its introduction in 2003, H.264/MPEG-4 AVC has been used in all sorts of situations, including movies published on the Internet, video phones, security cameras, and drones. Engineer Sid Bala explains in his blog how H.264/MPEG-4 AVC compresses video and the true nature of the technology.

H.264 is magic: a technical walkthrough of a remarkable technology.
https://sidbala.com/h-264-is-magic/

Uncompressed raw video is a series of dozens of still images per second, like a flip book. Furthermore, color information is expressed in 256 levels of R (red), G (green), and B (blue), so one pixel has three bytes of color information. For example, the file size per second of an uncompressed video with a resolution of 1080p and a refresh rate of 60Hz is 1920 pixels x 1080 pixels x 60Hz x 3 bytes, which is about 373MB/s.

Jason Eppink

However, even if you try to store a 1080p, 60Hz uncompressed movie on a 50GB Blu-ray disc, it will only fit about 2 minutes and 14 seconds, which is not practical. In order to make video data easier to handle, it is necessary to somehow compress its size. There are various methods for doing this, and one of the most commonly used is H.264/MPEG-4 AVC.

H.264/MPEG-4 AVC was jointly developed by the Video Coding Experts Group (VCEG), a division of the International Telecommunication Union (ITU), and ISO/IEC JTC 1. Although there are two names, 'H.264' and 'MPEG-4 AVC,' the two standards are technically the same; the only difference is the name of the standard between the ITU and ISO/IEC.

Bala compared a screenshot (in PNG format) of the Apple homepage with a 5-second H.264/MPEG-4 AVC movie of the Apple homepage captured at the same resolution and 60 fps. The screenshot, which is a still image, has a data size of 1015 KB, while the movie is 175 KB. Although the movie is made up of a series of 5 seconds x 60 fps = 300 still frames, it is compressed to about one-sixth the size of a single still image.

H.264/MPEG-4 AVC compression basically involves 'removing unnecessary data.' This is the same as removing unnecessary equipment such as the back seat, audio equipment, and air conditioning from a racing car, says Bala. H.264/MPEG-4 AVC compresses by removing unnecessary parts from the original data, so the original data cannot be reproduced from the compressed data. Therefore, H.264/MPEG-4 AVC is classified as a 'lossy compression method.' In contrast, PNG is a lossless compression method that does not lose image quality, so it can be difficult to reduce the file size even for just one image.

The image below shows an example of how compression is done with H.264/MPEG-4 AVC. In the 'Original' image on the left, the fine speaker grille of the MacBook Pro is clearly visible, whereas in the 'Post-Discard' image on the right, it is crushed and no longer visible. This difference is only noticeable when you zoom in on the image, but the file size after compression is about 7% of the original image. To understand the algorithm that enables such compression, Bala first introduces 'information entropy.'

For example, suppose you flip a coin 10 times and all the coins come up as heads. When recording this result, instead of writing '1st flip: heads, 2nd flip: heads, 3rd flip: heads, 4th flip: heads, 5th flip: heads, 6th flip: heads, 7th flip: heads, 8th flip: heads, 9th flip: heads, 10th flip: heads', you can write '10 flips, all heads' and the content will remain the same between the former and latter, but the amount of text will be reduced. This means that the data redundancy has been reduced without changing the information entropy of the dataset, and this method is called 'entropy coding'.

In addition, the color information of one frame of uncompressed video is originally expressed in RGB, but there is a technique called 'chroma subsampling' that expresses this in

YUV , which is Y ( brightness ), Cb (blue difference), and Cr (red difference). The human eye is sensitive to changes in brightness, but is insensitive to changes in color tone. Chroma subsampling makes it possible to reduce data by cutting out color difference information that is difficult for humans to perceive.

Furthermore, the paper introduces the idea of converting data sets that vary over space and time, such as color and brightness, into

the frequency domain . For example, parts of a MacBook Pro with drastic changes in color and brightness, such as the speaker grille, are in the 'high frequency domain,' which has high-frequency components. On the other hand, the desk top, with its gradual shadow gradation, is in the 'low frequency domain,' which has low-frequency components.

According to Bala, humans are quick to notice degradation in image quality in the low-frequency range, where changes are gradual and monotonous, but on the other hand, in the high-frequency range, where changes are rapid, it is difficult to notice even slight degradation. Therefore, by gradually removing information from the dataset in the high-frequency range, the compression is suppressed to a level of degradation that is not noticeable to humans.

The top row shows the image in the frequency domain, and the bottom row shows the image itself. As you move to the right, more and more high-frequency information (white areas) is discarded. The rightmost image has been compressed to a file size of just 2% of the original image on the left, but at a glance, the degradation is not noticeable. However, this compression method is lossy because the information entropy changes.

Furthermore, H.264/MPEG-4 AVC also compresses the video using a technique called inter-frame prediction . For example, in a movie of a tennis match shot with a fixed camera, the ball and players move around a lot, but the referee, net, and spectators do not move around much. If you make a movie in which the referee, net, and spectators move around in the background, with only the ball and players moving, you should be able to significantly compress the data size.

First, in H.264/MPEG-4 AVC, an image is divided into 16 pixel x 16 pixel blocks, and each still image encoded is called an I-frame. The frame that comes after the I-frame is then predicted, and only the difference between that frame and the actual frame is displayed. Since frames can be represented using only the difference information, the data size of the video can be significantly reduced.

Another important technology in H.264/MPEG-4 AVC is 'motion compression,' which detects the movement of objects in the previous and next frames, encodes the direction and speed of the movement, and stores it in the frame data, further increasing the compression rate. H.264/MPEG-4 AVC allows you to define the precision unit of motion compensation, making it possible to predict movement in units of as small as 4 pixels x 4 pixels. This inter-frame prediction and motion compression means that I-frames contain redundant information, but the data can be further compressed without loss using entropy coding.

By Jose Nicdao

According to Bala, the uncompressed video of the 60fps movie that captured the top page of Apple's official website for only 5 seconds had a resolution of 1232 x 1154. The file size of the uncompressed video is 1232 pixels x 1154 pixels x 60fps x 3 bytes x 5 seconds, which is about 1280MB, but by encoding it with H.264 / MPEG-4 AVC, the size was reduced to 175KB, which is only 0.01%. Bala evaluates, 'H.264 / MPEG-4 AVC is the culmination of more than 30 years of effort with the aim of reducing data transfer bandwidth, and is a remarkable technology.'

In addition, a more efficient compression standard called H.265/HEVC appeared in 2012 as a successor to H.264/MPEG-4 AVC. However, because H.265/HEVC contained patents from various companies, it was necessary to pay high license fees to use it, and even at the time of writing, it has not yet become widespread enough to replace H.264/MPEG-4 AVC. In light of such licensing issues, Google, which owns the world's largest movie sharing site, YouTube, announced a royalty-free container format called WebM in 2014.

Related Posts:

May 27, 2019 23:00:00 in Software, , Posted by log1i_yk