What is the whereabouts of the 'extra 40ms' that caused the Netflix app to freeze?
Netflix, a video distribution service, provides applications for a wide variety of smart TVs, including Android TV, Apple TV, and Amazon's Fire TV. Since collaboration between Netflix and smart TV vendors is essential for app development, Netflix has a job called 'partner engineer' that supports the vendor's app development. John Blair, one of those partner engineers, talks about the process from the occurrence of the 'video petit freeze bug' that he was involved in to the resolution.
The case of the extra 40 ms | Netflix TechBlog
One day in 2017, Blair was tasked with developing an Android 5.0-based app for Android TV. The four people involved in the project were a European pay TV that launches the device, a company that develops firmware for the device, a company that develops chips for the device, and Blair himself.
The project went smoothly halfway, but at the internal test stage on the pay TV side, there was a problem that the video was choppy. It seems that a problem always occurred when performing the operation of 'launching the Netflix application, playing the video, and then returning to the device-specific screen'. Blair felt pressure from stakeholders to fix bugs as a chip development engineer reported that 'the cause was the insufficient voice data transfer speed of Netflix's Android TV application'Ninja''. He says he has begun to investigate the problem.
Blair, who was wondering that Ninja installed on other devices had no problem and that only Ninja on the device used for the test had the problem, read the Ninja source code and investigated the voice transfer mechanism. I started to do it. First, Ninja will store the downloaded data in the buffer in the application once, and then store it in the decoder buffer on the device side. As a result of Blair's investigation, it was found that the process of transferring data from this Ninja buffer to the device's decoder buffer is done by the Android TV side, not Ninja.
Ninja's maximum frame rate is 60fps, so the update interval for one frame is about 16.66ms. In other words, if it takes more than 16.66 milliseconds to transfer one frame, the video will be interrupted. However, since the frame transfer rate on the Android TV side was 15 milliseconds, theoretically there was a margin in the transfer speed. 'Where's the extra time?' Blair wondered.
Blair created his own script to get the log of the transfer to investigate the problem further. The graph below is a visualization of the log, orange is the transfer speed (bytes / milliseconds), yellow is the time taken to call the transfer process (milliseconds), gray is the call interval of the transfer process (milliseconds). It has become.
If you look at the right part of the graph, which is the area where the video is interrupted, you can see that the call interval for the transfer process is 55 milliseconds, which is considerably larger than the normal 15 milliseconds. As a result of the transfer processing being called less frequently, the transfer speed of orange has also dropped to the area where comfortable video playback is not possible.
When Mr. Blair reported to the project officials that 'it is a problem due to the processing on the Android side', he said that 'then please reduce the call interval of the transfer processing'. However, Blair says he was saved by the following bug in Android 5 that his colleague discovered at this time. Android's buffer transfer process is called at 15ms intervals when the app is in the foreground, and at 55ms intervals after waiting an additional 40ms if it is in the background, but Android 5 has the app in the foreground. It was said that there was a bug that the call interval remained at 55 milliseconds even after moving to. It was a bug fixed in Android 6, so the root cause was a 'device-based Android 5.0 bug'.
Blair described the case as 'the most difficult' and 'worked as my favorite partner engineer, investigating a wide range of systems and working with colleagues to constantly learn and solve unpredictable problems. It represents very well. '
Related Posts:
in Software, Web Service, Posted by darkhorse_log