After memory-mapping, the process still consumes physical memory when there is a cache

joe_sleeping · 12-09-2022, 07:10 PM

I'm trying to understand mmap. As i know, mmap should map virtual address to page cache & thus there is no need to copy data from page cache to a process's virtual memory, and eventually there is a single copy of data in the whole machine.

However, when I try to mmap and read it, I can see the memory increases twice as file reading size, do I interpret it incorrectly or anything wrong about my code?

Memory consumption before testing:

Code:

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            3924        1391        2280          13         251        2292
Swap:              0           0           0

I run below python code:

Code:

import mmap
import os
  
import time
# file2.db is a 2 GB file
with open("/var/tmp/file2.db", "r") as f:
  with mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) as mm:
    x= mm.read(500000000)
      time.sleep(10000)

Code:

$ python3 mmap_read.py &

Memory consumption after testing:

Code:

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            3924        1703        1575          13         644        1980
Swap:              0           0           0

I further check syscall used by process, looks like there is no data copy

Code:

$ sudo perf record python3 mmap_read.py & # record syscall
$ sudo perf report

Result

Code:

Samples: 128  of event 'cpu-clock:pppH', Event count (approx.): 1292929280
Overhead  Command  Shared Object      Symbol
  24.22%  python3  [kernel.kallsyms]  [k] do_user_addr_fault
   4.69%  python3  [kernel.kallsyms]  [k] rmqueue
   3.91%  python3  [kernel.kallsyms]  [k] __add_to_page_cache_locked
   3.91%  python3  [kernel.kallsyms]  [k] charge_memcg
   3.91%  python3  libc.so.6          [.] 0x00000000001a0e81
   3.12%  python3  [kernel.kallsyms]  [k] __lock_text_start
   3.12%  python3  [kernel.kallsyms]  [k] xas_load
   3.12%  python3  libc.so.6          [.] 0x00000000001a0ef0
   2.34%  python3  [kernel.kallsyms]  [k] __mod_lruvec_state
   2.34%  python3  [kernel.kallsyms]  [k] do_anonymous_page
   2.34%  python3  [kernel.kallsyms]  [k] free_unref_page_list
   2.34%  python3  [kernel.kallsyms]  [k] release_pages
   2.34%  python3  libc.so.6          [.] 0x00000000001a0e6f
   1.56%  python3  [kernel.kallsyms]  [k] __cgroup_throttle_swaprate
   1.56%  python3  [kernel.kallsyms]  [k] __mod_node_page_state
   1.56%  python3  [kernel.kallsyms]  [k] filemap_map_pages
   1.56%  python3  [kernel.kallsyms]  [k] pmd_page_vaddr
   1.56%  python3  [kernel.kallsyms]  [k] pmd_pfn
   1.56%  python3  [kernel.kallsyms]  [k] xa_get_order
   1.56%  python3  libc.so.6          [.] 0x00000000001a0e4c
   1.56%  python3  libc.so.6          [.] 0x00000000001a0e7d
   1.56%  python3  libc.so.6          [.] 0x00000000001a0e86
   1.56%  python3  libc.so.6          [.] 0x00000000001a0f47
   0.78%  python3  [kernel.kallsyms]  [k] __bio_add_page
   0.78%  python3  [kernel.kallsyms]  [k] __handle_mm_fault
   0.78%  python3  [kernel.kallsyms]  [k] __mem_cgroup_charge
   0.78%  python3  [kernel.kallsyms]  [k] __page_set_anon_rmap
   0.78%  python3  [kernel.kallsyms]  [k] arch_local_irq_enable
   0.78%  python3  [kernel.kallsyms]  [k] blk_mq_dispatch_rq_list
   0.78%  python3  [kernel.kallsyms]  [k] cgroup_rstat_updated
   0.78%  python3  [kernel.kallsyms]  [k] clear_page_erms
   0.78%  python3  [kernel.kallsyms]  [k] do_set_pte
   0.78%  python3  [kernel.kallsyms]  [k] elv_rqhash_add
   0.78%  python3  [kernel.kallsyms]  [k] finish_task_switch.isra.0
   0.78%  python3  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm
   0.78%  python3  [kernel.kallsyms]  [k] handle_mm_fault
   0.78%  python3  [kernel.kallsyms]  [k] handle_pte_fault
   0.78%  python3  [kernel.kallsyms]  [k] kthread_blkcg
   0.78%  python3  [kernel.kallsyms]  [k] page_counter_try_charge
   0.78%  python3  [kernel.kallsyms]  [k] pmd_val
   0.78%  python3  [kernel.kallsyms]  [k] try_charge_memcg
   0.78%  python3  [kernel.kallsyms]  [k] xas_find
   0.78%  python3  [kernel.kallsyms]  [k] zap_pte_range
   0.78%  python3  libc.so.6          [.] 0x00000000001a0e5a
   0.78%  python3  libc.so.6          [.] 0x00000000001a0e76
   0.78%  python3  libc.so.6          [.] 0x00000000001a0f02
   0.78%  python3  libc.so.6          [.] 0x00000000001a0f07
   0.78%  python3  libc.so.6          [.] 0x00000000001a0f27
   0.78%  python3  libc.so.6          [.] 0x00000000001a0f57
   0.78%  python3  libc.so.6          [.] 0x00000000001a0f5f
   0.78%  python3  python3.10         [.] 0x000000000012161a
   0.78%  python3  python3.10         [.] 0x000000000012d084

I would expect the buff/cache grows & used is the same, as the process should reference the data in page cache, any idea on that? Any help is appreciated.

pan64 · 12-10-2022, 09:31 AM

I think you mixed some things or at least I don't understand what do you mean by page cache. https://www.sobyte.net/post/2022-03/mmap/

joe_sleeping · 12-10-2022, 07:28 PM

Hi pan64, thank you for your reply.

The page cache I meant is the cache linux kernel store after reading file data. For example, when you issue read syscall, kernel will copy the data from disk, cache it and copy it to the process's virtual memory.

I do agree that I probably mix up something, but no idea what do I understand incorrectly.

pan64 · 12-11-2022, 03:15 AM

mmap and that cache are two independent things. mmap does not work on that (or with) that cache. This cache is mostly completely invisible from this point of view. (if I understand well)

joe_sleeping · 12-12-2022, 05:40 AM

I figured what's wrong with my code. Python allocates a buffer in the process & copy the cache into that buffer. I created another c program and it uses the page cache.

Code:

#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>


int main(void) {
   int fd = open("/var/tmp/large", O_RDONLY); // /vart/tmp/large is a large file
   size_t size;
   struct stat statbuf;
   int err = fstat(fd, &statbuf);
   size = statbuf.st_size;
   // size = 4096;
   char * region = mmap(
     NULL, size,
     PROT_READ, MAP_SHARED,
     fd, 0
   );
   printf("%s", region);
   int unmap_result = munmap(region, size);
   close(fd);
   return 0;
}

Before running that code:

Code:

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            3924        1221        2204          11         498        2467
Swap:              0           0           0
$ vmtouch /var/tmp/large
           Files: 1
     Directories: 0
  Resident Pages: 30725/488282  120M/1G  6.29%
         Elapsed: 0.008021 seconds

During running that code, buff/cache value goes up, used value unchanged

Code:

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            3924        1210         727          11        1985        2477
Swap:              0           0           0

During running that code, heap size is unchanged while Referenced value of large (i.e. /var/tmp/larget) goes up

Code:

$ pmap -X <<PID>>
   Inode    Size     Rss     Pss Referenced  Anonymous Mapping
       0     132       4       4          4          0  [heap]
 1048884 1953128 1953128 1953128    1951100          0   large

After running that code, entire file is cached

Code:

$ vmtouch /var/tmp/large
UbuntuVM:~/Desktop/c-test$ vmtouch /var/tmp/large
           Files: 1
     Directories: 0
  Resident Pages: 488282/488282  1G/1G  100%
         Elapsed: 0.014291 seconds

syg00 · 12-12-2022, 05:47 AM

Quote:

Originally Posted by joe_sleeping

Python allocates a buffer in the process & copy the cache into that buffer.

Thanks for the update - I was going to suggest python itself was the likely culprit, but with no experience, couldn't really comment.

You might like to look at bpf for finer (more targetted) tracing capabilities.

joe_sleeping · 12-12-2022, 05:48 AM

Hi pan64,

I don't think so. My understanding is as below.

Assuming there is no page cache in both cases,
1. Normal file read copies data pull data from disk to kernel memory space, it then copies the data from kernel to process's memory space.
2. After memory mapping a file, when we try to read some part of the file in a process, the process will reference its page table & see if the data exists or not. If the data does not exist, page fault will happen. Data will be copied to kernel memory and page table is updated. Then the process can reference the page table and see the data. That's why "used" value is unchanged while page cache goes up when we read a mmap-ed file

If there is any misunderstanding, appreciate if you can point it out.

pan64 · 12-12-2022, 06:27 AM

The full memory (including code, data, whatever) used by a process can be put into cache, not only the mmapped parts. The process itself is not responsible for the usage of cache, but the kernel itself.
What you will see highly depend on the load of the system. Also kernel will try to use all the available ram for caching, if possible (but obviously won't do that if there is no more ram to use). So I don't really think it is that simple.