How does the widely used video compression codec 'H.264 / MPEG-4 AVC' dramatically compress huge-sized movies?



H.264 / MPEG-4 AVC / MPEG-4 AVC, which is one of the MPEG-4 video compression standards, is a video compression codec that aims for high image quality even at low bit rates. Since its introduction in 2003, H.264 / MPEG-4 AVC has been used in all sorts of situations, including movies, video phones, security cameras and drone published on the Internet. An engineer, Sid Bala, explains in a blog about how H.264 / MPEG-4 AVC compresses the video.

H.264 is magic: a technical walkthrough of a remarkable technology.
https://sidbala.com/h-264-is-magic/

Uncompressed raw video is a series of dozens of still images joined together in a second, like a flip manga. Furthermore, since color information is expressed in RGB of R (red), G (green) and B (blue) of 256 levels, one pixel has 3 bytes of color information. For example, the file size per second of uncompressed video with a resolution of 1080p and a refresh rate of 60 Hz is approximately 373 MB / s with 1920 pixels × 1080 pixels × 60 Hz × 3 bytes.


by

Jason Eppink

However, even if you try to store a 1080p · 60Hz uncompressed movie on a Blu-ray disc with a capacity of 50GB, this calculation is only about 2 minutes and 14 seconds, which is not practical. To make video data easy to handle, you need to somehow compress its size. There are various methods to do this, and one of the most commonly used is H.264 / MPEG-4 AVC.

H.264 / MPEG-4 AVC was jointly developed by the Video Coding Experts Group (VCEG), a division of the International Telecommunications Union (ITU), by ISO / IEC JTC 1 . There are two names, 'H.264' and 'MPEG-4 AVC', but the technically identical ones, except that the standard names are different on ITU side and ISO / IEC side.

Bala compares a screenshot of Apple's homepage (PNG format) with an H.264 / MPEG-4 AVC movie that captured Apple's homepage for 5 seconds at the same resolution and 60 fps. The data size of the still image screen shot is 1015 KB, while the movie is 175 KB. Although the movie consists of a series of still images of 5 seconds x 60 fps = 300 frames, it has been compressed to about one-sixth the size of a single still image.

H.264 / MPEG-4 AVC compression is basically 'deleting unnecessary data'. This is the same thing as throwing away unnecessary equipment such as racing cars, backseats, audio equipment and air conditioning, Bala said. H.264 / MPEG-4 AVC compresses by removing unnecessary parts from the original data, so the original data can not be reproduced from the compressed data. Therefore, H.264 / MPEG-4 AVC is classified as 'non-reversible compression method'. On the other hand, PNG is a lossless compression that does not impair the image quality, so it is difficult to reduce the file size even with a single sheet.

The following image shows an example of what compression is performed with H.264 / MPEG-4 AVC. In the image of 'Original (original image)' on the left, it can be seen that the detailed speaker grille of MacBook Pro is clearly seen, while in 'Post-Discard (after compression)' on the right, it is crushed and disappears You This difference is noticed only by zooming the image, but the file size after compression is about 7% of the original image. In order to understand the algorithm that enables such compression, Bala first introduces 'information entropy'.



For example, suppose you throw a coin 10 times, and all the faces come out. When this result is recorded, “1st: Table, 2nd: Table, 3rd: Table, 4th: Table, 5th: Table, 6th: Table, 7th: Table, 8th: Table, 9th : If you write the table and the 10th table: 'If you throw it 10 times, all the tables will come out,' the contents shown in the former and the latter will not change at all and only the amount of characters in the record will decrease. This means that we have reduced data redundancy without changing the informational entropy of the data set, and we call this method entropy coding.

Also, originally, the color information of one non-compressed moving image frame is expressed in RGB, but this is referred to as 'chroma subsampling' expressed in Y (

brightness ), Cb (blue difference), Cr (red difference) YUV There is also a method. Human eyes are sensitive to changes in brightness, but insensitive to changes in color. Chroma subsampling enables data reduction by cutting color difference information that is difficult for humans to recognize.



In addition, the concept of converting a space or time-varying data set, such as color tone or brightness, into the

frequency domain is introduced. For example, parts such as MacBook Pro's speaker grill where changes in color tone and brightness are intense are 'high frequency areas' that have high frequency components. On the other hand, a desk with a gentle gradation with shadows is a 'low frequency area' with low frequency components.

According to Bala, people notice immediate deterioration in image quality if the change is slow and monotonous low-frequency areas, but on the contrary, it is difficult to recognize places where they are somewhat deteriorated in high-frequency areas where the change is intense. Therefore, by gradually excluding the data set information in the high frequency domain, it compresses while suppressing degradation to a level unnoticeable to humans.

The upper part represents the image in the frequency domain, and the lower part is the image. As you move to the right, information (white parts) in the high frequency area is discarded little by little. The right-most image is compressed to a file size of only 2% compared to the left-most original image, but the level is not obvious at first glance. However, this compression scheme is lossy because the information entropy changes.



In addition, H.264 / MPEG-4 AVC also performs compression using a technique called interframe prediction . For example, in the case of a movie where a tennis game is shot with a fixed camera, the ball and players move greatly, but the referee, the net and the game spectators do not move much. The data size should be able to be greatly compressed if it is a movie that moves only the ball and the player with the referee, the net and the audience as one background.

First, in H.264 / MPEG-4 AVC, an image is divided into blocks of 16 pixels × 16 pixels, and one still image encoded is called an I frame. It then predicts the frame that follows the I-frame and displays only the deviation between that frame and the one that actually comes. Since the frame can be expressed only by the difference information, the data size of the video can be greatly reduced.

H.264 / MPEG-4 is also used for motion compression, which further increases the compression ratio by detecting the motion of objects in the previous and subsequent frames, encoding the direction and speed, and storing the data in the frame. An important technology of AVC. With H.264 / MPEG-4 AVC, it is possible to define the accuracy unit of motion compensation, and it is possible to predict motion in units of 4 pixels × 4 pixels as a minimum. With this interframe prediction and motion compression, I-frames contain redundant information, but entropy coding can further compress the data without loss.


by Jose Nicdao

According to Bala, the uncompressed video of the 60 fps movie that captured the top page of Apple's official website for just 5 seconds was a resolution of 1232 x 1154. When the file size of uncompressed video is calculated, it becomes approximately 1280 MB by 1232 pixels × 1154 pixels × 60 fps × 3 bytes × 5 seconds, but it is up to 175 KB which is only 0.01% by encoding with H.264 / MPEG-4 AVC It will be reduced in size. Bala notes that 'H.264 / MPEG-4 AVC is a crystal of not less than 30 years of effort to reduce data transfer bandwidth, and a remarkable technology.'

H.265 / HEVC, a successor to H.264 / MPEG-4 AVC, has been introduced in 2012 with a more compression-efficient standard. However, because H.265 / HEVC contains patents from various companies, it needs to pay a high license fee to use it, and it will replace H.264 / MPEG-4 AVC even at the time of article creation. Does not show the spread of Also, in light of these licensing issues, in 2014 Google, which has the world's largest movie sharing site YouTube, has announced a royalty-free container format called WebM .

in Software, Posted by log1i_yk