Dataflow sanitizer memory mapping question

Hi,

I created an example program to learn how to use the clang dataflow sanitizer and noticed that when the program is compiled with the -fsanitize=dataflow option, the application and its shared libraries are mapped differently to memory. My question is why the program section is relocated to higher addresses and shared objects appear to be relocated below the heap? For instance, see below the two memory maps for the same program compiled without/with the dataflow sanitizer option.

Is this necessary for the memory shadowing mechanism used by the sanitizer? I’d really appreciate if someone could explain the design decision behind these remappings.
Ultimately, I’d like to know if it’s possible to keep the program mapped to its original low addresses without hurting the dataflow sanitizer assumptions.

Without sanitizer

00400000-00401000 r-xp 00000000 08:01 2491706 /home/frederico/dev/tests/llvm/labelprop/loop
00600000-00601000 r–p 00000000 08:01 2491706 /home/frederico/dev/tests/llvm/labelprop/loop
00601000-00602000 rw-p 00001000 08:01 2491706 /home/frederico/dev/tests/llvm/labelprop/loop
01169000-0118a000 rw-p 00000000 00:00 0 [heap]
7ffb4531d000-7ffb454d9000 r-xp 00000000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7ffb454d9000-7ffb456d8000 —p 001bc000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7ffb456d8000-7ffb456dc000 r–p 001bb000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7ffb456dc000-7ffb456de000 rw-p 001bf000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7ffb456de000-7ffb456e3000 rw-p 00000000 00:00 0
7ffb456e3000-7ffb45706000 r-xp 00000000 08:01 1338667 /lib/x86_64-linux-gnu/ld-2.19.so
7ffb458e9000-7ffb458ec000 rw-p 00000000 00:00 0
7ffb45902000-7ffb45905000 rw-p 00000000 00:00 0
7ffb45905000-7ffb45906000 r–p 00022000 08:01 1338667 /lib/x86_64-linux-gnu/ld-2.19.so
7ffb45906000-7ffb45907000 rw-p 00023000 08:01 1338667 /lib/x86_64-linux-gnu/ld-2.19.so
7ffb45907000-7ffb45908000 rw-p 00000000 00:00 0
7fff6d64d000-7fff6d66e000 rw-p 00000000 00:00 0 [stack]
7fff6d7fe000-7fff6d800000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

With sanitizer

00010000-200200000000 rw-p 00000000 00:00 0
200200000000-700000008000 —p 00000000 00:00 0
7f8068983000-7f8068b3f000 r-xp 00000000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7f8068b3f000-7f8068d3e000 —p 001bc000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7f8068d3e000-7f8068d42000 r–p 001bb000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7f8068d42000-7f8068d44000 rw-p 001bf000 08:01 1338668 /lib/x86_64-linux-gnu/libc-2.19.so
7f8068d44000-7f8068d49000 rw-p 00000000 00:00 0
7f8068d49000-7f8068d5f000 r-xp 00000000 08:01 1314808 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8068d5f000-7f8068f5e000 —p 00016000 08:01 1314808 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8068f5e000-7f8068f5f000 rw-p 00015000 08:01 1314808 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8068f5f000-7f8068f62000 r-xp 00000000 08:01 1338681 /lib/x86_64-linux-gnu/libdl-2.19.so
7f8068f62000-7f8069161000 —p 00003000 08:01 1338681 /lib/x86_64-linux-gnu/libdl-2.19.so
7f8069161000-7f8069162000 r–p 00002000 08:01 1338681 /lib/x86_64-linux-gnu/libdl-2.19.so
7f8069162000-7f8069163000 rw-p 00003000 08:01 1338681 /lib/x86_64-linux-gnu/libdl-2.19.so
7f8069163000-7f8069268000 r-xp 00000000 08:01 1338671 /lib/x86_64-linux-gnu/libm-2.19.so
7f8069268000-7f8069467000 —p 00105000 08:01 1338671 /lib/x86_64-linux-gnu/libm-2.19.so
7f8069467000-7f8069468000 r–p 00104000 08:01 1338671 /lib/x86_64-linux-gnu/libm-2.19.so
7f8069468000-7f8069469000 rw-p 00105000 08:01 1338671 /lib/x86_64-linux-gnu/libm-2.19.so
7f8069469000-7f8069470000 r-xp 00000000 08:01 1338685 /lib/x86_64-linux-gnu/librt-2.19.so
7f8069470000-7f806966f000 —p 00007000 08:01 1338685 /lib/x86_64-linux-gnu/librt-2.19.so
7f806966f000-7f8069670000 r–p 00006000 08:01 1338685 /lib/x86_64-linux-gnu/librt-2.19.so
7f8069670000-7f8069671000 rw-p 00007000 08:01 1338685 /lib/x86_64-linux-gnu/librt-2.19.so
7f8069671000-7f806968a000 r-xp 00000000 08:01 1338684 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f806968a000-7f8069889000 —p 00019000 08:01 1338684 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f8069889000-7f806988a000 r–p 00018000 08:01 1338684 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f806988a000-7f806988b000 rw-p 00019000 08:01 1338684 /lib/x86_64-linux-gnu/libpthread-2.19.so
7f806988b000-7f806988f000 rw-p 00000000 00:00 0
7f806988f000-7f80698b2000 r-xp 00000000 08:01 1338667 /lib/x86_64-linux-gnu/ld-2.19.so
7f8069a93000-7f8069a98000 rw-p 00000000 00:00 0
7f8069aab000-7f8069ab1000 rw-p 00000000 00:00 0
7f8069ab1000-7f8069ab2000 r–p 00022000 08:01 1338667 /lib/x86_64-linux-gnu/ld-2.19.so
7f8069ab2000-7f8069ab3000 rw-p 00023000 08:01 1338667 /lib/x86_64-linux-gnu/ld-2.19.so
7f8069ab3000-7f8069ab4000 rw-p 00000000 00:00 0
7f8069ab4000-7f8069ad4000 r-xp 00000000 08:01 2491855 /home/frederico/dev/tests/llvm/labelprop/loop2
7f8069cd3000-7f8069cd4000 r–p 0001f000 08:01 2491855 /home/frederico/dev/tests/llvm/labelprop/loop2
7f8069cd4000-7f8069cd5000 rw-p 00020000 08:01 2491855 /home/frederico/dev/tests/llvm/labelprop/loop2
7f8069cd5000-7f806a86a000 rw-p 00000000 00:00 0
7f806aeec000-7f806af0d000 rw-p 00000000 00:00 0 [heap]
7fff2b930000-7fff2b951000 rw-p 00000000 00:00 0 [stack]
7fff2b9fe000-7fff2ba00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Thanks,
Fred

+pcc

Thanks for the prompt reply!

Thanks for the prompt reply!

+pcc

Hi,

I created an example program to learn how to use the clang dataflow
sanitizer and noticed that when the program is compiled with the
-fsanitize=dataflow option, the application and its shared libraries are
mapped differently to memory. My question is why the program section is
relocated to higher addresses and shared objects appear to be relocated
below the heap?

This happens because -fsanitize=dataflow implies -pie which changes the
mapping the way you describe.
tsan (--fsanitize=tread) and msan (-fsanitize=memory) behave the same
way.
This is intentional, such mapping makes the implementation more
efficient.

For instance, see below the two memory maps for the same program
compiled without/with the dataflow sanitizer option.

Is this necessary for the memory shadowing mechanism used by the
sanitizer? I'd really appreciate if someone could explain the design
decision behind these remappings.
Ultimately, I'd like to know if it's possible to keep the program mapped
to its original low addresses without hurting the dataflow sanitizer
assumptions.

for msan not using -pie would be a significant performance hit, and dfsan
is rather similar to msan.
pcc, please correct me if I am wrong.
for tsan this will be less of an issue, but still undesirable.

I understand the performance issues. But say I wanted to disable it to
perform some tests, what should I do to disable it (I tried passing
-fno-pie to clang, but it's not working probably because of
-fsanitize=dataflow, which forces -pie during linking)?

You would need to change how dfsan does shadow address mapping. Currently
dfsan uses the lower 32TB of the process's address space for shadow memory.
If you disable PIE, Linux will allocate the binary's data in the same part
of the address space, which will conflict with shadow memory.

Why does this bother you?

The problem is that I believe this remapping could be interfering with
another component of a project I've been working on, which takes snapshots
of a running application. But I'm not completely sure about it yet.

Does dfsan maps the 64 Terabytes of virtual address space like msan?

Yes, and this would be independent of whether dfsan uses PIE.

Peter

Thanks for the explanations!

Thanks for the explanations!

Thanks for the prompt reply!

+pcc

Hi,

I created an example program to learn how to use the clang dataflow
sanitizer and noticed that when the program is compiled with the
-fsanitize=dataflow option, the application and its shared libraries are
mapped differently to memory. My question is why the program section is
relocated to higher addresses and shared objects appear to be relocated
below the heap?

This happens because -fsanitize=dataflow implies -pie which changes the
mapping the way you describe.
tsan (--fsanitize=tread) and msan (-fsanitize=memory) behave the same
way.
This is intentional, such mapping makes the implementation more
efficient.

For instance, see below the two memory maps for the same program
compiled without/with the dataflow sanitizer option.

Is this necessary for the memory shadowing mechanism used by the
sanitizer? I'd really appreciate if someone could explain the design
decision behind these remappings.
Ultimately, I'd like to know if it's possible to keep the program
mapped to its original low addresses without hurting the dataflow sanitizer
assumptions.

for msan not using -pie would be a significant performance hit, and
dfsan is rather similar to msan.
pcc, please correct me if I am wrong.
for tsan this will be less of an issue, but still undesirable.

I understand the performance issues. But say I wanted to disable it to
perform some tests, what should I do to disable it (I tried passing
-fno-pie to clang, but it's not working probably because of
-fsanitize=dataflow, which forces -pie during linking)?

You would need to change how dfsan does shadow address mapping. Currently
dfsan uses the lower 32TB of the process's address space for shadow memory.
If you disable PIE, Linux will allocate the binary's data in the same part
of the address space, which will conflict with shadow memory.

Correct me if I'm wrong, but this mapping strategy allows for the
efficient look up of shadow locations through masking and shifting of the
application memory addresses, right?

Right.

I anticipate that changing it is not really a good idea :slight_smile:
I'd like to find a way around my problem without touching the sanitizer's
logic.

Linux doesn't give you much control over where processes are mapped to. As
far as I know, you will either need to keep PIE enabled or change the
mapping strategy -- something has to give.

Peter