I just completed a series of changes that shrunk the Chrome browser’s on-disk size on Windows by over a megabyte, moved about 500 KB of data from its read/write data segments to its read-only data segments, and reduced its private working set by about 200 KB per-process. The amusing thing about this series of changes is that they consisted entirely of removing const from some places, and adding const to others. Compilers be weird.
I filed several bugs while writing this post. One of them asked VC++ to support extern of constexpr objects, and that was just (on November 7, 2017) closed as fixed.
This task started when I was writing documentation for some tools I’d been using to investigate binary-size regressions on Windows. I pasted in some sample output from one of the tools and was writing an explanation of it when I thought “that’s funny…” I’d noticed that several of the large globals, which looked like they should be constant data, were in the read/write data segment. An abbreviated version of the tool’s output is shown below:
Most executable formats have at least two data segments – one for read/write globals and one for read-only globals. If you have constant data, such as kBrotliDictionary, then it is best to have it in the read-only data segment, which is segment ‘2’ in Chrome’s Windows binaries. However some logically constant data, such as unigram_table, device::UsbIds::vendors_, and blink::serializedCharacterData were in section ‘3’, the read/write data segment.
Putting data in the read-only data segment has a few advantages. It makes it impossible to corrupt the data through stray writes, and it also ensures greater efficiency. The read-only pages are guaranteed to be shared between all processes that load that DLL, which in the case of Chrome can be a lot of processes. And, in some cases (although probably not these) the compiler can use those constant values directly.
Pages in the read/write data segment might end up staying shared, but they might not. These pages are all created as copy-on-write which means that they are shared until written to, at which point they are copied into per-process private pages. So, if a global variable is initialized at run-time it becomes private data. Or, if a global variable is on the same page as another global that is written to then it becomes private data – everything happens at the page granularity (4 KiB).
Private data uses more memory because there is a separate copy for each process, but private pages are also more expensive because they are backed by the pagefile instead of by the image file. This means that when memory gets low they have to be written to the pagefile instead of just being discarded, and paging them back in is more expensive because they tend to get written randomly and therefore get read randomly. For all of these reasons shared pages are strictly better than potentially-private pages.
Adding const is good
So, when my ShowGlobals tool showed that blink::serializedCharacterData was in the read/write data segment, and when investigation showed that it was never modified, I landed a change to add a const modifier, and it dutifully moved to the read-only data segment. Simple enough. A change like this is never a bad idea, but it’s hard to know how much it helps. Because this array is never written to it might end up staying as shared data forever. Or, more likely, if the ends of the array end up sharing pages with global variables that are written to then part of the array will end up turning into per-process private data, wasting 7,748 or 3,652 bytes (the size of the array minus one or two pages in the middle which are guaranteed to stay shared) in each process. Changes like this should help on all platforms, with all toolchains.
Marking your const arrays as const is a good idea, you should do it, and I’m sure some developers don’t know this, but I’m not sure this information alone would have been enough to inspire this blog post. Here is where we veer into peculiar new territory…
Sometimes removing const is even better
The next array that I investigated was unigram_table. This one was peculiar because it was initialized entirely with constant data using struct/array initializer syntax and it was marked as const but it was still being placed in the read/write data segment. This looked like an interesting VC++ compiler quirk so I followed my own instructions for how to reduce and report a compiler bug. I copied the types and the array definition to a stand-alone project and kept reducing it, all the time checking to see if the array was still going in the read/write data segment. I kept reducing until I had this repro, which is small enough to easily fit in a tweet, or one line of a blog post:
const struct T {const int b[999]; } a[] = {{{}}}; int main() {return(size_t)a;}
If you compile this and run ShowGlobals on the PDB it will show that ‘a’ is in section ‘3’, despite being tagged as const. Here are the steps for building and testing the code:
> “%VS140COMNTOOLS%..\..\VC\vcvarsall.bat”
> cl /Zi constbug.cpp
/out:constbug.exe
> ShowGlobals.exe constbug.pdb
Size Section Symbol name
3996 3 a
Having reduced my example to less than 140 characters it was easy to find the trigger. With VC++ (2010, 2015, 2017 RC) if you have a class/struct with a const member variable then any global objects of that type end up in the read/write data segment. Jonathan Caves explains in a comment on my bug report that this happens because the type has a “compiler generated deleted default constructor” (makes sense) which confuses VC++ which incorrectly sees this as a class that needs dynamic initialization.
So, the problem in this case is the const modifier on the ‘b’ member variable. Once that is deleted the array, ironically enough, gets put in read-only memory. Since the object itself is const the removal of const from the member variable doesn’t reduce safety at all, and actually improves it for VC++ builds.
I suspect that the VC++ team will fix this for VS 2017 – it seems like an easy win to me – but I didn’t want to wait that long. So, I started landing changes to remove const on member variables in Chrome where it was causing problems. This was simply a matter of continuing to look through the list of large global variables in the read/write data segment and categorizing them:
- Actually written to – leave alone
- Not written and missing const on the global variable – add it
- Not written and problematic const on a member variable – remove it
Okay, this looks really funny…
So, I worked my way through Chrome’s source code, adding and removing const in a few useful places. For most of my changes the effect was just to move some data from the read/write data segment to the read-only data segment, as expected, but two of the changes did much more. Two of the changes shrunk the .text and .reloc sections. This was great, but it seemed too good to be true. This suggested that VC++ was generating code to initialize some of these arrays – a lot of code.
The most interesting change was removing three const keywords from the declaration of the UnigramEntry struct. Doing this moved 53,064 bytes to the read-only data segment, but it also saved over 364,500 bytes of code, in both chrome.dll and chrome_child.dll. That suggests that VC++ had been quietly creating an initializer function that used an average of almost seven bytes of code to initialized each byte of unigram_table. That couldn’t be. It was too much to hope for that my change would help that much!
So, I launched Chrome under the VS debugger and set a breakpoint at the end of the unigram_table array and VS dutifully stopped there at the beginning of the initializer. I’ve put a cleaned up and simplified excerpt of the assembly language below (I replaced ‘unigram_table’ with ‘u’ to make it fit better):
55 push ebp
8B EC mov ebp,esp
83 25 78 91 43 12 00 and dword [u],0
83 25 7C 91 43 12 00 and dword [u+4],0
83 25 80 91 43 12 00 and dword [u+8],0
83 25 84 91 43 12 00 and dword [u+0Ch],0
C6 05 88 91 43 12 4D mov byte [u+10h],4Dh
C6 05 89 91 43 12 CF mov byte [u+11h],0CFh
C6 05 8A 91 43 12 1D mov byte [u+12h],1Dh
C6 05 8B 91 43 12 1B mov byte [u+13h],1Bh
C7 05 8C 91 43 12 FF 00 00 00 mov dword [u+14h],0FFh
C6 05 90 91 43 12 00 mov byte [u+18h],0
C6 05 91 91 43 12 00 mov byte [u+19h],0
C6 05 92 91 43 12 00 mov byte [u+1Ah],0
C6 05 93 91 43 12 00 mov byte [u+1Bh],0
… 52,040 lines deleted…
c6 05 02 6e 0b 12 6c mov byte [u+cf42h],6Ch
c6 05 03 6e 0b 12 6e mov byte [u+cf43h],6Eh
c6 05 04 6e 0b 12 a2 mov byte [u+cf44h],0A2h
c6 05 05 6e 0b 12 c2 mov byte [u+cf45h],0C2h
c6 05 06 6e 0b 12 80 mov byte [u+cf46h],80h
c6 05 07 6e 0b 12 c4 mov byte [u+cf47h],0C4h
5d pop ebp
c3 ret
The hex numbers along the left are the machine-code bytes and the text to the right of that is the assembly language. After some prologue code the code gets into a routine of writing to the array… one byte at a time… using seven-byte instructions. Well, that explains the size of the initializer.
It is well known that optimizing compilers can write code that is as good or better than human beings in most cases. And yet. Sometimes they don’t. There are so many things that could be better about this function:
- It could not exist. The array is initialized using C array syntax and if it wasn’t for this VC++ code-quality bug with const struct members then there would be no initializer – on other platforms there isn’t
- The writes of zero could be omitted. The array is a global variable being initialized at startup and the contents of memory are already guaranteed to be zero, so every write of zero can be removed
- Data could be written four bytes at a time (ten-byte instruction) instead of one byte at a time (seven-byte instruction)
- The address of the array could be loaded into a register and then dereferenced instead of specifying its address in every. single. instruction. This would make the instructions smaller and would also save two bytes per instruction of relocation data, found in the .reloc segment
You get the idea. The function could be a quarter the size, or it could be omitted entirely. Anyway, with the three const keywords removed the initializer disappeared entirely (it is gone from canary builds already) and the ~364,500 bytes of code and ~105,000 bytes of relocation information have disappeared, from both chrome.dll and chrome_child.dll. The array used to be in the .BSS (zero initialized portion of the read/write data segment) where it occupied no space on disk and it now occupies 53,064 bytes in the read-only data segment, which is why the total disk-size savings is about 416,000 bytes per DLL. This change gave similar but slightly smaller improvements.
Most importantly, the various globals involved in these two changes go from being mostly or completely per-process private data (due to being initialized at run-time) to being shared data, saving an estimated 200 KB of data per process.
Example changes
I started with the largest and most common objects and types in order to get the biggest wins first and I quickly reduced the size of the read/write data segment by about 250 KB, moving over 1,500 global variables to the read-only data segment. It’s easy to get caught up in the game (who me? OCD? I don’t know what you’re talking about) but I have managed to stop even though I know that there are still hundreds more small global variables that I could fix. After a while I hit diminishing returns and it was time to move onto something else. To put it another way, if you’ve always wanted to land a Chromium change I left a few for you. Here are some of the changes landed as part of this project:
Some changes that deleted const:
Some changes that added const:
I was really hoping for conservation-of-const, and there are enough globals missing const that I could yet attain it.
See for yourself
If you want to debug Chrome and see the unigram_table initializer before it goes away then it’s fairly straightforward – you don’t need to be a Chrome developer to do it. Start by executing these two commands:
> “%VS140COMNTOOLS%..\..\VC\vcvarsall.bat”
> devenv /debugexe chrome.exe
Make sure you have Chrome’s symbol server set up in VS per these instructions and then set a breakpoint on this symbol:
`dynamic initializer for ‘unigram_table”
In case wordpress mangles it that’s a back-tick, some magic text, a single quote, the symbol name, and then two single quotes. Make sure you’ve shutdown Chrome completely, and then launch Chrome from VS. VS will download Chrome’s symbols (the magic of symbol servers!) and set a breakpoint on the initializer, if it exists. Handy. If necessary you can switch to assembly language mode (Ctrl+F11) and marvel. If you want to see the source code then just enable source server by checking the box in the Visual Studio debug options.
And that’s how I spent my holidays – for less technical holiday shenanigans read about warm weather snowmen, and the importance of winning Christmas.
Reddit discussion is here, hacker news here, and twitter comments start here.