Hitting kMaxNumChunks

Hello,

We've had a build that hit the following assert:
AddressSanitizer CHECK failed:
/var/lib/jenkins/jenkins/workspace/fst-clang/local/src/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_allocator.h:1078
"((idx)) < ((kMaxNumChunks))" (0x40000, 0x40000)

Increasing the limit and recompiling seems like the obvious
workaround, but I'm wondering if i have better options than that. Any
thoughts?

Thank you,
Frederik

llvm 3.9 seems pretty old.
Does this happen with trunk?

Hello Kostya,

I see that master has the same value for kMaxNumChunks, is there
anything in particular that leads you to think i wouldn't run into the
same limit?

Thanks,
Frederik

Hello Kostya,

I see that master has the same value for kMaxNumChunks, is there
anything in particular that leads you to think i wouldn't run into the
same limit?

No. It's just that I haven't heard anyone else complain recently.
If you have a reproducer that works on trunk, I'll be happy to look at it.

--kcc

FWIW, I upgraded to 5.0.1, and we hit the issue again. We do not
really have a repro available, in the sense that the box needs to
receive traffic for a couple of hours in order for the build to hit
the issue, it is however fairly consistent. Is there any
instrumentation that we could add in order to understand the issue
better, and in the meantime, is it safe to just increase the value for
kMaxNumChunks ?

Thanks,
Frederik

+Aleksey, who has been dealing with the allocator recently.

If you have a “((idx)) < ((kMaxNumChunks))” (0x40000, 0x40000)
check failure, it means that you’ve allocated (and did not deallocate) 2^18 large heap regions, each at least (2^17+1) bytes.
This means, that you have live large heap chunks of 2^35 bytes (or more) in total, which is 32Gb.
Does this sound correct?

If yes, yea, I guess we need to bump kMaxNumChunks

+Aleksey, who has been dealing with the allocator recently.

If you have a "((idx)) < ((kMaxNumChunks))" (0x40000, 0x40000)
check failure, it means that you've allocated (and did not deallocate) 2^18
large heap regions, each *at least* (2^17+1) bytes.
This means, that you have live large heap chunks of 2^35 bytes (or more) in
total, which is 32Gb.
Does this sound correct?

Yes, it does. Our software typically allocates several hundred Gb, and
the size of the objects depend on the traffic. They would
realistically reach 2^17+1 bytes.

If yes, yea, I guess we need to bump kMaxNumChunks

I'll increase the limit to 2^19 for our build, and I'll report the results here.

Thanks,
Frederik

> +Aleksey, who has been dealing with the allocator recently.
>
> If you have a "((idx)) < ((kMaxNumChunks))" (0x40000, 0x40000)
> check failure, it means that you've allocated (and did not deallocate)
2^18
> large heap regions, each *at least* (2^17+1) bytes.
> This means, that you have live large heap chunks of 2^35 bytes (or more)
in
> total, which is 32Gb.
> Does this sound correct?
Yes, it does. Our software typically allocates several hundred Gb, and
the size of the objects depend on the traffic. They would
realistically reach 2^17+1 bytes.

> If yes, yea, I guess we need to bump kMaxNumChunks
>
>
I'll increase the limit to 2^19 for our build, and I'll report the results
here.

Yes, please do. Thanks!

Hello,

[...]

> If yes, yea, I guess we need to bump kMaxNumChunks
>
>
I'll increase the limit to 2^19 for our build, and I'll report the results
here.

I ended up increasing the limit to 2^20, because the max allocation
for large objects is around 100G on those hosts. With that done, i hit
an issue where adding stacks to the StackDepot became a visible
bottleneck after a few hours. To solve that, i've set
`malloc_context_size=0`, because we're not tracking leaks, and in our
experience getting the call site of the ASAN report carries enough
information to diagnose the issue. With these two changes, the build
is behaving as expected.
If you think it's worth doing, i'm happy to post a patch for the
constant increase.

Thanks,
Frederik

I’m afraid, increasing the size of the chunks array for all platforms might not go that well with iOS and Android (look for ASAN_LOW_MEMORY for the reference). I’ll think of the best way to handle it. Thank you, Frederik!