Efficient Memory Mapping for Terabyte Sparse Files in Java

On Linux, you can create sparse files, where only the pages (of 4 KiB) that are touched utilise either memory or disk space. This allows you to memory map large virtual regions without worrying about wasted memory or disk

In this program, you can see it reserves 8 TiB (8,192 GiB)

Figure 1. Test 1: Sparse file

Tip: x << y means x × 2y therefore 

1L << 10 = 1 KiB (1024 bytes), 

1L << 20 = 1 MiB (10242 bytes), 

1L << 30 = 1 GiB (10243 bytes), 

1L << 40 = 1 TiB etc

Using multiples of 10 for the shift makes them easier to read. 

64L << 20 is 64 × 220 = 64 × 10242 = 64 MiB. 

The virtual memory size of the above process is just over 8192 GiB at 8200.7 GiB, but the RSS (Resident Set Size) is only  122,060  KB, or 122  MB.

Figure 2. RES for Test 1

On disk, the extents reported are 8 TiB, however the amount of disk (and memory) actually used is just 20 KiB.

Figure 3. Disk usage for Test 1

The following test displays the main point of this article more clearly. In the test the reserved virtual memory is 8 TiB again but data has been written sparsely; 1000 integers are written but there is 16L << 10 (16 KiB = four pages) skip after each write. 

Figure 4. Test 2: Sparse file with skipped pages

The RSS (Resident Set Size) is only  129,272  KB, or 122  MB and the disk usage is only 4.0 MiB which indicates that only touched pages use memory. Although it seems the size of data is 16 KiB * 1000 = 16 MiB but only 1 out of 4 pages have been touched so the actual disk usage is 4KiB * 1000 = 4.0 MiB

Figure 5. RES for Test 2

Figure 6. Disk usage for Test 2


Mapping large areas of memory avoids having to know in advance how much memory we need or having to resize the memory mappings while in use, while accessing the data as direct memory without the overhead of system calls. 

In short, using virtual memory, instead of real memory, gives greater flexibility to how we tune our systems. Files that can be pruned lazily make it clear the files won’t be extended.

In memory mapped files, only the touched pages use disk space. On the system used for the tests in this article each page can hold 4 KiB data space hence writing data sparsely so that some pages were skipped did not increase disk usage, in other words only the touched pages contributed to memory demand.

Want to Find Out More?

You can contact us to find out more about our services and products.
Or, you can join our LinkedIn Community - 'Chronicle Performance Engineers'.

Peter Lawrey

Peter Lawrey is a Java Champion and Oracle Code One alumnus. Peter likes to inspire developers to improve the craftsmanship of their solutions and his popular blog “Vanilla Java” has had over 4 million views. Peter is the founder and architect of Chronicle Software. He has one of the top number of answers for Java and JVM on StackOverflow.com (~13K).
Posted in

Want more from Chronicle? Subscribe today!

Something went wrong. Please check your entries and try again.