Hunting Down Dirty Memory Pages

I recently had to debug a complex issue with memory page usage in EFL; you will likely never encounter this issue because it’s only relevant for shared libraries developers, and even then, not always. However, I think I think it is beneficial for everyone to be familiar with how things work at a lower level, so I decided to write this post.

A few weeks ago I got a report about an increase in private dirty pages from our libraries that essentially caused increased memory consumption for every application linking to EFL. The main culprit was the object system (Eo), which I maintain, so I decided to take a look.

For my first step I performed a manual review which led me to a mistake in related code that I eventually fixed. My fix improved the situation a bit, but the dirty page issue was mostly unchanged, so I investigated more deeply…

Note: unless specifically mentioned otherwise, all of this post assumes Linux on Intel hardware. The details may vary, but the concepts should apply almost everywhere.

Introducing Linux Memory Pages

First, if you are not familiar with the concept, the Wikipedia page on the subject has a good general explanation. Pages are essentially blocks of virtual memory, and are the smallest unit of data the OS manages. The page size is usually 4KiB, and this is the case here.

While an executable is being compiled, all of the information in it is being mapped to different sections depending on usage. Here’s an example with clang (it may change based on the compiler) that will be mapped to .rodata, which is “read-only data”:

static const int a = 5;

Another example is executable code (actual instructions) that would be mapped to .text. Then, the linker decides how to map all of this into actual memory and, thus, into pages. Pages have permissions associated with them: read (R), write (W) and execute (X). For example, for security reasons the stack is marked as RW because you want to be able to read and write to the stack, but not executable in order to protect from a certain class of attacks. The actual executable code is marked as RX, that is you can read it and execute it, but not modify it. A nice feature of non-writeable pages is they never change, (duh…) so, the OS can reuse them and save memory. For example, if your executable size is 2MiB, it’ll be loaded into the memory once and won’t be loaded for every instance of the application. The OS is smart enough to share these pages.

As a side note, Linux also implements Copy-on-Write for RW pages, so even if a page is RW, it may be shared across different instances, assuming the data hasn’t been written to. Pages that can be shared are called clean, and ones that have been written to are called dirty.

There is more to be said about pages, but I’ve covered all that’s needed to investigate my issue, so we will stop here.

The Reported Issue

Now that we know a bit more about pages we can more intelligently discuss the reported issue. The problem was EFL in general, heavy users of Eo in particular suddenly had a lot of private dirty pages. This means a lot of memory pages that are mapped from the library itself (in the executable, not allocated on runtime) are being written to, and thus can’t be shared and must be duplicated for each running process: a big issue for heavily used libraries.

RW pages exist for a reason, so these could simply be legitimate usages, though judging by the amount of pages this seemed unlikely. I needed to find what is mapped to these pages, so I started there.

Figuring Out What’s Using the Memory

Unfortunately, while nm is a very useful command to map symbols to memory regions and make it easier to tell which symbols are mapped to RW pages (most likely); it doesn’t – and can’t – indicate which symbols map to dirty pages. Even more unfortunate is that I’m not aware of any tool that provides this information (if you know of one, please mention it in the comments!). I was about to write a test case where all the relevant RW pages in my test case were dirty, so any memory in them would be relevant.

Note: in order to check which variables got mapped to RW pages I used pmap. Using this tool I was able to see the address range for each page, and by using some debug output I was able to get the addresses of the symbols in question, leading me to figure out which were stored in RW pages. Using pmap is very easy; more on that in the next section.

I already knew which structures were the largest in Eo, so I decided to guess if they were mapped to RW or RO pages. My guess was spot on; I found a few symbols that should be RO but were actually RW. For example:

 static const Efl_Event_Description *_event_desc[] = {
      // SNIP ...

This is a common mistake due to the confusing syntax C uses for const. This is an array of pointers to the const Efl_Event_Description. It may look correct at first glance until you realize the array itself is not constant. It should be:

 static const Efl_Event_Description * const _event_desc[] = {
      // SNIP ...

This change saved us a few pages in the more event heavy areas, which is a first step, but the problem was still there, so the search recommenced.

Then, I stumbled upon this:

static const Efl_Class_Description _class_desc = {
    // SNIP ...

This looks innocent. The Efl_Class_Description type is a struct, and const was correctly applied. This should have definitely been RO, but for some reason it was put in a RW page. Seeing this, and other similar structures, I knew I found what I was looking for, now I just needed to figure out why it was happening.

After thinking about it for a bit, and considering a few different ideas, I suspected it was related to the fact that while these structures were constant, some of the fields were referring to other symbols; in some cases the linker would have to figure out the address at runtime due to relocation, and thus, won’t be able to mark the pages as RO. This could be checked easily with nm, but I didn’t think about it at the time, so I went on investigating by other means. I ended up writing a small contained example to discover if I was right or not.

I ran my example in both gcc and clang. Unfortunately gcc gave me less than optimal results, so I will use clang in my examples.

Checking My Hypothesis

In order to check my hypothesis I wrote a small program (issue.c):

#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define PAGE_SIZE 4096
#define ALLOC_SIZE (PAGE_SIZE * 1000)

typedef struct
   const void *invalidater;
   const char data[ALLOC_SIZE];
} Invalid;

static const char ro[ALLOC_SIZE];
static const Invalid rw = { NULL, { 0 }};

int main()
   printf("%zd\n", (size_t) getpid());
   printf("%p %p\n", ro, &rw); // So they are not optimised out
   scanf("\n"); // Keep the program running
   return 0;

This program attempts to allocate two variables:

  • ro: 1000 pages of read only memory.
  • rw: 1000 pages (and one pointer) of read only memory that I suspected was going to be RW.

Upon running it prints its PID and then waits; this allowed me to inspect exactly what’s going on. For that I used pmap (and redacted some of the non-relevant output).

Let’s compile and run our program:

$ clang issue.c
$ ./a.out
0x4006a0 0x7e86a0

And then in another terminal:

$ pmap 11835
11835:   ./a.out
0000000000400000   8004K r-x-- a.out
0000000000dd0000      4K rw--- a.out

As you can see, both variables have been mapped to RO pages (the first). This is what we expected (which wasn’t the case with GCC) because it doesn’t rely on anything that’s not known on compile time. This was just a test to see everything works.

Now, I changed the program to check my hypothesis. I changed the NULL in the declaration of rw to some symbol that may be relocated, for example strlen, and then compiled it and ran it again:

$ clang issue.c
$ ./a.out
0x4006e0 0x7e86e0

Then, in another terminal:

$ pmap 11941
11941:   ./a.out
0000000000400000   8004K r-x-- a.out
0000000000dd0000      4K rw--- a.out

And it still works…

At this point I started to question myself, maybe I was wrong and something else was going on. Then I realized there is still one thing that is different between my test case and the libraries that exhibit the issue. They are libraries, and thus have position independent code, so I tested once more, this time with PIC enabled:

$ clang issue.c -fPIC
$ ./a.out
0x4006d0 0x9e8818

And then in another terminal:

$ pmap 12002
12002:   ./a.out
0000000000400000   4004K r-x-- a.out
00000000009e8000   4004K rw--- a.out

Voila! We managed to replicate the issue.

Verifying with nm

As I mentioned before, this would have been easy to verify with nm, so I’ll also show this for completeness. However, even with nm I would have needed to enable PIC to trigger the issue.

Relevant nm output for the RO (issue not present) case:

$ nm -f sysv ./a.out
rw |07e86e0| r | OBJECT|03e8008| |.rodata

As you can see, rw is put into the .rodata section, that is read-only data.

Relevant nm output for the RW (issue present) case:

$ nm -f sysv ./a.out
rw |09e8818| d | OBJECT|03e8008| |

Here, rw is put into the section, which is a section that is read-only after relocation, this means it’s not read-only.

My Pages are RO and not RW

I got reports from two people (thanks Daniel Hirt!) that their pmap output looked something like this:

$ pmap 10214
10214:   ./a.out
0000000000400000   4004K r-x-- a.out
00000000009e8000   4004K r---- a.out
0000000000dd1000      4K rw--- a.out

The reason for this is most likely linker differences. One way to verify this is indeed the case:

$ strace ./a.out
... SNIP ...
mprotect(0x9e8000, 4100096, PROT_READ)  = 0
... SNIP ...

As you can see, the program is calling mprotect to change the page (look at the address) to be RO. If you read the previous section about nm, you probably saw rw was put into the “read-only after relocation” section, meaning the linker was allowed and encouraged to mark the pages read-only after it finished the relocation updates.

As mentioned earlier, I took a short-cut to check for RW pages instead of private dirty pages because I had complete overlap between the two. This short-cut may not work for your case, though the nm output should still give you the information you need.

Solving the Issue

Two solutions come to mind. The first is to separate the “truly constant” values from the relocatable values. So for example, if we had a struct like:

struct {
     int ro;
     void *symbol;

It should be split it to two separate structs.

Or alternatively, I could reconsider parts of the design. Perhaps using the pointers in the struct is not necessary and it might be enough to pass them to a function. This way the structures won’t be mapped to RW memory, but would be temporarily stored on the stack before the function invocation, reducing the memory usage.

I have already reduced some memory in Eo users by using the second method. I will soon complement that with the first method to reduce it even further. Preliminary tests show significant reduction in memory usage, so a big win.

Lessons Learned

While I knew the theory behind it, I was surprised to see how code I assumed would be mapped to RO pages ended up in RW pages that get dirty immediately. More specifically, any structure with a pointer to anything, even to a constant string, would end up in RW pages:

static const char * const rw = "test";

Review your libraries to make sure they aren’t wasting memory, and remember that heap memory is not the only memory that can be wasted. Please let me know if you spotted any mistakes or have any suggestions for improvement.

This article was originally posted on Tom’s personal blog, and has been approved to be posted here.

Author: Tom Hacohen

Tom has been a core developer and part of the leading team at SHR (Openmoko), he is currently a core developer for EFL.