I think a question was glossed over. Exactly which directions should be inlined…
User callee into user caller (definitely not)
System callee into system caller (yes)
User callee into system caller (maybe?)
System callee into user caller (maybe?)
Perhaps number 3 should be prohibited because then a breakpoint on “my” function would either not get hit, or turn into multiple breakpoints.
Perhaps number 4 should be prohibited because it makes stepping across loop iterations in things like std::transform more difficult.
I deliberately don’t bring up call direction because I don’t think it’s important and I want to keep things simple:
- If the code is user code, it should not be inlined.
- If the code is system code (including non-standard 3rd-party dependencies), it should be inlined.
I think Ben Craig’s point about call direction is important. Or at least, his intuition about what you meant matches my intuition, whereas the words you just used in the paragraph above do not match my intuition about what you had meant in P1832.
Thanks for providing a perspective of which I was unaware.
Take a hypothetical call stack from this example program:
?: std::strncmp(…) ← C standard library function, possibly an intrinsic, inline
?: std::string::operator==(char const*) ← standard library function, inline
8: [](std::string const&) {…} ← lambda, don’t inline
?: std::find_if(…) ← standard library taking lambda, inline
4: contains_sg15 ← user function containing lambda, don’t inline
13: main ← user’s main function, don’t inline
I think whether to inline the lambda is mildly contentious but I think we both agree
it should not be inlined. The real question is whether to inline std::find_if
. I guess
you’re suggesting we don’t inline it. I think – on balance – we probably should.
The verb “inline” doesn’t operate on a node of this graph; it operates on an edge between two nodes. My intuition is that -Og should enable inlining on every edge where both endpoint nodes are in “system code,” and on no other edges. With your above callstack:
Yes inline strncmp into operator== (system into system)
No don’t inline operator== into the user’s lambda (system into user)
No don’t inline the user’s lambda into std::find_if (user into system)
No don’t inline std::find_if into contains_sg15 (system into user)
No don’t inline contains_sg15 into main (user into user)
My intuition is that we don’t want to inline system code into user code because that would mix “not-my-code” into the codegen for “my code.” Imagine the user trying to step through contains_sg15
, and suddenly they’re seeing backward branches and loops. Where did those loops come from? Ah, the compiler inlined a bunch of crap from . That’s not the debugging experience we want (says my intuition). So we should not inline std::find_if
into contains_sg15
.
Either yours or my particular formula for inlining choice improve the situation. That makes me happy. But I’m far from sold on your formula. (I’ll concentrate on find_if
.)
Little of what I imagine as the UX for this feature comes from intuition. It comes from my experience as a C developer, a ‘trad’ C++ developer and as a games developer. Imagine that we ‘degenericize’ the STL and imagine that find_if
and all the other library calls are plain ol’ functions that are defined in a separate TU. The TU is compiled into a .lib file at -O2
and linked to the users’ debug build. Q: What happens now when I step into contains_sg15
? A: Exactly the same behaviour. So there’s a consistency argument to be made. It’s the ‘User Expectation’ from the title of P1832.
Secondly, I wonder if you overestimate just how little a typical game dev cares about being able to step through the loop inside a standard algorithm. For a number of reasons, that code is scary and intimidating for most humble devs. When they step into standard library code by accident, it interrupts their concentration and presents them with an _Uglified mess.
Thirdly, I explicitly simplified my description of find_if
for the sake of argument. But now my simplification gets in the way. If you look at the real find_if
, you’ll see a great many levels of function calls. (I assume they are your nodes.) The first one when you step into find_if
has no for
loop. It is a wrapper to the implementation’s algorithm. The last one before you step into the lambda also contains no for
loop but is a function dedicated to invoking lambdas. So your formula for when to inline library functions will only degrade the experience. Instead, you’d need to say that everything between contains_sg15
and the lambda must not be inlined. At this point, you’re losing a lot of your performance gains which brings me on to…
Finally, because some of these abstractions boil down to very few (sometimes zero!) assembler instructions, any additional assembler you get from relaxing the criteria for inlining can heavily affect run-time performance and binary size. I suspect that on balance, the target audience here won’t want to pay that price.
What about inlining the user’s lambda into std::find_if? I think we have always agreed on this one. The user must be able to set a single breakpoint in the lambda, and run contains_sg15
, and see the breakpoint get hit. If there’s a second copy of the lambda’s code inlined into std::find_if
, then this scenario falls apart. So, we should not inline the user’s lambda into std::find_if
. In fact, the same argument implies that we should never inline user code into anywhere. One line of user source code should continue to map to one point in the compiled binary.
Yes, we always want to facilitate breakpoints etc. in user code.
Let’s step through this program and let’s assume that I only use the “step into”
functionality of the debugger, because that’s an easy way to ‘visit’ an entire
program. Let’s assume find_if
is a single function.
I think you have to consider the workflow for people who use the “disassemble” functionality of the debugger, as well.
I’d be interested to learn more.
Also, I’m not sure how the “step into/ single-step” functionality works, but I would have thought that it just means “continue forward until the current instruction is no longer associated with the same source line of code.” Which means that it would still be possible to “step into” system code, because system code does still have line numbers associated with it.
That’s roughly my understanding, yes. You can test this with forceinline
. There are debugging facilities (including Jeff Trull’s) which I mention in the paper and which help avoid stepping into the ‘wrong’ functions.
I intuit that it should be easy to “step over” calls to library functions; which means that you must ensure that the call to std::find_if
shows up as a call instruction in the binary. If we inline std::find_if
into contains_sg15
, then you can’t “step over” it anymore.
So, are you saying that if we inline find_if
AND we generate debugging symbols, then the debugger’s just going to jump us to find_if
when we step over, right? I’m not sure if that’s what happens. Again, you can test this with forceinline
. I think the debugger’s smart enough to skip over those lines. But I still expect my proposed solution to have some major flaw I didn’t think about such as this!
Thanks,
John