Why is it faster to use 'mmap' to read and write program files?



“Reading and writing files” is a frequently used operation in software development, and increasing speed greatly affects the performance of the entire software.

Alexandra Fedorova , an associate professor at the University of British Columbia , explains 'why mmap allows you to work with files faster than regular system calls' when doing such file read / write operations.

Why mmap is faster than system calls | by Alexandra (Sasha) Fedorova | Medium
https://sasha-f.medium.com/why-mmap-is-faster-than-system-calls-24718e75ab37

When a user executes a program on the OS, the program uses two types of areas called 'user space' and 'kernel space'. The user space is freely accessible to the program, but the kernel space is not directly accessible to the program. Separating the resource space into two is excellent from a security point of view, but it is inconvenient for the user to not be able to handle the kernel space because processing involving hardware operations such as reading and writing files can only be performed in the kernel space. ..

Therefore, 'system call' bridges the user space and the kernel space so that the user can also handle the kernel space. For example, when reading a file, use the open system call to create a file descriptor for inputting and outputting the file, and then call the read system call to read the file data from the file descriptor into the buffer and enable data manipulation. ..

The above is the general procedure for operating files on the OS, but you can also read and write files using the mmap system call. mmap is a system call that can map a file on the virtual memory of the OS, and since the file can be read and written from the mapped virtual memory address, there is no need to use another system call.

Fedorova measured sequential and random read speeds in block sizes of 4KB, 8KB, and 16KB in each case to compare file manipulation speeds with normal system calls and mmap. And that. The graph below shows the measurement result of the sequential read speed when the data exists in the buffer cache, and you can see that the read speed with mmap shown in yellow is faster.



Random read speed is similar to sequential read, and mmap reads faster than normal system calls.



Looking at the CPU usage status when performing sequential read with a block size of 16KB, the time taken for 'copy_user_enhanced_fast_string' to copy data from kernel space to user space accounts for about 61% of the execution time of the entire program. You can see that. About 15% of the time is devoted to other instructions that involve moving from kernel space to user space, such as 'functions do_syscall_64' and 'entry_SYSCALL_64'.



When using mmap, about 61% was spent on '__memmove_avx_unaligned_erms'. In other words, it can be said that the difference in efficiency between 'copy_user_enhanced_fast_string', which accounts for most of the processing by normal system calls, and '__memmove_avx_unaligned_erms', which accounts for most of the processing by mmap, greatly affects the difference in reading speed.



The difference in efficiency between the two is whether it supports AVX , which can handle data in multiple streams. As the name suggests, '__memmove_avx_unaligned_erms' is compatible with AVX and can use the memory bandwidth efficiently, but 'copy_user_enhanced_fast_string' is not compatible with AVX and the bandwidth cannot be fully used. Fedorova explains that this is a major reason why mmap is faster than regular system calls for file operations.

The reason why normal system calls do not support AVX is that register save and restore operations occur for each system call, which increases the processing load of moving between user space and kernel space. 'By replacing regular system calls with mmap, we may be able to make our applications run faster,' Fedorova said.

in Software, Posted by darkhorse_log