After memory-mapping, the process still consumes physical memory when there is a cache
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
After memory-mapping, the process still consumes physical memory when there is a cache
I'm trying to understand mmap. As i know, mmap should map virtual address to page cache & thus there is no need to copy data from page cache to a process's virtual memory, and eventually there is a single copy of data in the whole machine.
However, when I try to mmap and read it, I can see the memory increases twice as file reading size, do I interpret it incorrectly or anything wrong about my code?
Memory consumption before testing:
Code:
$ free -m
total used free shared buff/cache available
Mem: 3924 1391 2280 13 251 2292
Swap: 0 0 0
I run below python code:
Code:
import mmap
import os
import time
# file2.db is a 2 GB file
with open("/var/tmp/file2.db", "r") as f:
with mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) as mm:
x= mm.read(500000000)
time.sleep(10000)
Code:
$ python3 mmap_read.py &
Memory consumption after testing:
Code:
$ free -m
total used free shared buff/cache available
Mem: 3924 1703 1575 13 644 1980
Swap: 0 0 0
I further check syscall used by process, looks like there is no data copy
Code:
$ sudo perf record python3 mmap_read.py & # record syscall
$ sudo perf report
I would expect the buff/cache grows & used is the same, as the process should reference the data in page cache, any idea on that? Any help is appreciated.
The page cache I meant is the cache linux kernel store after reading file data. For example, when you issue read syscall, kernel will copy the data from disk, cache it and copy it to the process's virtual memory.
I do agree that I probably mix up something, but no idea what do I understand incorrectly.
mmap and that cache are two independent things. mmap does not work on that (or with) that cache. This cache is mostly completely invisible from this point of view. (if I understand well)
I figured what's wrong with my code. Python allocates a buffer in the process & copy the cache into that buffer. I created another c program and it uses the page cache.
Code:
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(void) {
int fd = open("/var/tmp/large", O_RDONLY); // /vart/tmp/large is a large file
size_t size;
struct stat statbuf;
int err = fstat(fd, &statbuf);
size = statbuf.st_size;
// size = 4096;
char * region = mmap(
NULL, size,
PROT_READ, MAP_SHARED,
fd, 0
);
printf("%s", region);
int unmap_result = munmap(region, size);
close(fd);
return 0;
}
Assuming there is no page cache in both cases,
1. Normal file read copies data pull data from disk to kernel memory space, it then copies the data from kernel to process's memory space.
2. After memory mapping a file, when we try to read some part of the file in a process, the process will reference its page table & see if the data exists or not. If the data does not exist, page fault will happen. Data will be copied to kernel memory and page table is updated. Then the process can reference the page table and see the data. That's why "used" value is unchanged while page cache goes up when we read a mmap-ed file
If there is any misunderstanding, appreciate if you can point it out.
The full memory (including code, data, whatever) used by a process can be put into cache, not only the mmapped parts. The process itself is not responsible for the usage of cache, but the kernel itself.
What you will see highly depend on the load of the system. Also kernel will try to use all the available ram for caching, if possible (but obviously won't do that if there is no more ram to use). So I don't really think it is that simple.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.