Evaluating the same expression at the same breakpoint gets slower after a certain number of steps

Guilherme_Andrade · July 4, 2019, 6:38pm

I have two breakpoint inside methods that are called every frame (C++ project using Unreal), and every time one of them is reached, I evaluate one expression (I’m being able to reproduce this using an undefined name, say “undefinedVariable”). After a few iterations (usually tens), the time it takes for LLDB to say that name doesn’t exist increases, despite being the same expression, at the same breakpoint and the call stack remaining unchanged.

I’ve noticed that the number of lexical Decl queries and imports conducted by Clang reported in ‘Local metrics’ increase.

They go from:
Number of visible Decl queries by name : 29
Number of lexical Decl queries : 9
Number of imports initiated by LLDB : 15
Number of imports conducted by Clang : 827
Number of Decls completed : 5
Number of records laid out : 2

To:
Number of visible Decl queries by name : 29

Number of lexical Decl queries : 14
Number of imports initiated by LLDB : 15
Number of imports conducted by Clang : 1342
Number of Decls completed : 5
Number of records laid out : 2

Also, the number of SymbolFileDWARF operations in the logs jumps from 366 to 406.

So, I’ve got two questions. 1) Is it safe to say that those extra imports and Decl queries are responsible for the performance loss? 2) Why do they happen?

Thanks!

Jason_Molenda1 · July 8, 2019, 10:33pm

Hm, that's interesting.

I tried running a debug lldb on /bin/ls. then I attached from another lldb. I put a breakpoint on CommandObjectTargetModulesLookup::DoExecute and resumed execution. In the debuggee lldb, I did

tar mod loo -a 0

and auto-repeated return so the same command would be executed over and over.

In the debugger lldb, I tried adding a command to the breakpoint,

br comm add
p does_not_exist
DONE

and continuing - after a couple dozen times, I didn't see a slowdown. I tried adding a breakpoint condition,

br mod -c 'doesNotExist == 1' 1

and continuing, and didn't see a slowdown after a few dozen repetitions.

I'm on macOS using .o file DWARF debugging.

I'm sure there's a bug here, but it may be more specific to the platform and type of debug info that you're using? It could be that lldb is too small of a project to repo this problem too.

jingham · July 8, 2019, 11:03pm

Are any sources for symbols getting added to the process as you go along, like new shared library loads? You are asking for a non-existent name, so every new source of debug info is a new place to look.

It's also possible that we are completing some types during the course of the expression evaluation, and then the newly completed types become themselves new places to look for this non-existent name.

You can see what is getting looked up by turning in the expression log (log enable lldb expr) then looking for the FindExternalVisibleDecls entries.

Jim

Guilherme_Andrade · July 9, 2019, 6:52pm

Thanks for looking into this, Jason. I ended up realizing that it isn’t about the number of evaluations; it is the exact location of the breakpoints that matters.

I’m being able to consistently reproduce this by having two breakpoints (A and B) inside the same function, but B being inside the scope of a for loop that defines a couple extra variables. The first evaluation of ‘undefinedVariable’ at breakpoint A generates the following FindExternalLexicalDecls calls:

FindExternalLexicalDecls[5] on (ASTContext*)000001566104F030 in ‘Type0’ (CXXRecordDecl*)0000015668964438
FindExternalLexicalDecls[6] on (ASTContext*)000001566104F030 in ‘Type6’ (CXXRecordDecl*)0000015668961BA8
FindExternalLexicalDecls[7] on (ASTContext*)000001566104F030 in ‘Type7’ (CXXRecordDecl*)0000015668961A68
FindExternalLexicalDecls[8] on (ASTContext*)000001566104F030 in ‘Type8’ (CXXRecordDecl*)0000015674F56208
FindExternalLexicalDecls[9] on (ASTContext*)000001566104F030 in ‘Type9’ (ClassTemplateSpecializationDecl*)0000015674F56CD0
FindExternalLexicalDecls[10] on (ASTContext*)000001566104F030 in ‘Type10’ (CXXRecordDecl*)0000015668964300
FindExternalLexicalDecls[11] on (ASTContext*)000001566104F030 in ‘Type10’ (CXXRecordDecl*)0000015668964300
FindExternalLexicalDecls[12] on (ASTContext*)000001566104F030 in ‘Type0’ (CXXRecordDecl*)0000015668964438
FindExternalLexicalDecls[13] on (ASTContext*)000001566104F030 in ‘Type11’ (EnumDecl*)0000015674F62778

Then, the evaluation at breakpoint B generates:

FindExternalLexicalDecls[19] on (ASTContext*)000001F1F6275320 in ‘Type0’ (CXXRecordDecl*)000001F1EA798868
FindExternalLexicalDecls[20] on (ASTContext*)000001F1F6275320 in ‘Type1’ (CXXRecordDecl*)000001F1EA799098
FindExternalLexicalDecls[21] on (ASTContext*)000001F1F6275320 in ‘Type2’ (ClassTemplateSpecializationDecl*)000001F1EA7A0D50
FindExternalLexicalDecls[22] on (ASTContext*)000001F1F6275320 in ‘Type3’ (CXXRecordDecl*)000001F1EA79D818
FindExternalLexicalDecls[23] on (ASTContext*)000001F1F6275320 in ‘Type4’ (CXXRecordDecl*)000001F1EA79D548
FindExternalLexicalDecls[24] on (ASTContext*)000001F1F6275320 in ‘Type5’ (CXXRecordDecl*)000001F1EA7A2A08
FindExternalLexicalDecls[25] on (ASTContext*)000001F1F6275320 in ‘Type6’ (CXXRecordDecl*)000001F1EA7BB368
FindExternalLexicalDecls[26] on (ASTContext*)000001F1F6275320 in ‘Type7’ (CXXRecordDecl*)000001F1EA7BB228
FindExternalLexicalDecls[27] on (ASTContext*)000001F1F6275320 in ‘Type8’ (CXXRecordDecl*)000001F1F68B9088
FindExternalLexicalDecls[28] on (ASTContext*)000001F1F6275320 in ‘Type9’ (ClassTemplateSpecializationDecl*)000001F1F68B2AC0
FindExternalLexicalDecls[29] on (ASTContext*)000001F1F6275320 in ‘Type10’ (CXXRecordDecl*)000001F1EA798730
FindExternalLexicalDecls[30] on (ASTContext*)000001F1F6275320 in ‘Type10’ (CXXRecordDecl*)000001F1EA798730
FindExternalLexicalDecls[31] on (ASTContext*)000001F1F6275320 in ‘Type0’ (CXXRecordDecl*)000001F1EA798868
FindExternalLexicalDecls[32] on (ASTContext*)000001F1F6275320 in ‘Type11’ (EnumDecl*)000001F1F68CB3E0
FindExternalLexicalDecls[33] on (ASTContext*)000001F1F6275320 in ‘Type12’ (CXXRecordDecl*)000001F1E2351F48

And the subsequent evaluations at breakpoint A, which are slower, produce:

FindExternalLexicalDecls[91] on (ASTContext*)000001566104F030 in ‘Type0’ (CXXRecordDecl*)000001566895C3B8
FindExternalLexicalDecls[92] on (ASTContext*)000001566104F030 in ‘Type1’ (CXXRecordDecl*)000001566895CBE8
FindExternalLexicalDecls[93] on (ASTContext*)000001566104F030 in ‘Type2’ (ClassTemplateSpecializationDecl*)00000156689678D0
FindExternalLexicalDecls[94] on (ASTContext*)000001566104F030 in ‘Type3’ (CXXRecordDecl*)0000015668964398
FindExternalLexicalDecls[95] on (ASTContext*)000001566104F030 in ‘Type4’ (CXXRecordDecl*)00000156689640C8
FindExternalLexicalDecls[96] on (ASTContext*)000001566104F030 in ‘Type5’ (CXXRecordDecl*)0000015674F608B8
FindExternalLexicalDecls[97] on (ASTContext*)000001566104F030 in ‘Type6’ (CXXRecordDecl*)0000015674F731B8
FindExternalLexicalDecls[98] on (ASTContext*)000001566104F030 in ‘Type7’ (CXXRecordDecl*)0000015674F73078
FindExternalLexicalDecls[99] on (ASTContext*)000001566104F030 in ‘Type8’ (CXXRecordDecl*)0000015674F7B278
FindExternalLexicalDecls[100] on (ASTContext*)000001566104F030 in ‘Type9’ (ClassTemplateSpecializationDecl*)0000015674F78CF0
FindExternalLexicalDecls[101] on (ASTContext*)000001566104F030 in ‘Type10’ (CXXRecordDecl*)000001566895C280
FindExternalLexicalDecls[102] on (ASTContext*)000001566104F030 in ‘Type10’ (CXXRecordDecl*)000001566895C280
FindExternalLexicalDecls[103] on (ASTContext*)000001566104F030 in ‘Type0’ (CXXRecordDecl*)000001566895C3B8
FindExternalLexicalDecls[104] on (ASTContext*)000001566104F030 in ‘Type11’ (EnumDecl*)0000015674F81510

The FindExternalVisibleDecls entries remain the same among all evaluations at breakpoint A.

So, Jim, if symbols are getting added at B or types are being completed, then the extra FindExternalLexicalDecls calls for Type1-5 is intended behavior?

Guilherme_Andrade · July 11, 2019, 4:53pm

Speaking more generally, can evaluating an expression with no side effects affect the outcome of subsequent evaluations? I.e, is it possible for the expression evaluation X to produce different results in the following scenarios?

Scenario 1:
Scenario 2:

Thanks!

jingham · July 11, 2019, 6:24pm

lldb realizes types from DWARF lazily. So for instance, if an expression refers to a pointer to type Foo, we won't necessarily realize the full type of Foo from DWARF to parse that expression. Then if you write a second expression that accesses a member of an object of type Foo, we will realize the full type for Foo. Then if you run the first expression again, the pointer to Foo type in the lldb type system will now point to a realized type of Foo. That should not make any difference, since if we were right the first time that we didn't need to know anything about Foo, it shouldn't matter whether the full type is realized or not.

Similarly, the "expression with no side effects" could have also caused lldb to realize any number of other types. We find names from type information in two ways: looking in the realized types, and looking in the name indexes we read from DWARF or (on systems without accelerator tables) from indexing the DWARF manually. So the expression with no side effects will change the types in the realized types pool. That also "should not matter" because if the expression X was looking up by name, the lookup through the name indexes when the type wasn't realized yet should be equivalent to the lookup in the realized types.

We really have no choice but to realize types lazily in this way, the debugger would just run way too slowly on big projects if we didn't. So the fact that one expression can change the state of the type system for a subsequent expression is a fact of life for lldb. But if the lookup mechanism is working properly, then these changes shouldn't matter. OTOH shouldn't and don't are not quite the same...

Jim

martong · July 12, 2019, 10:08am

Guilherme,

Could you please check if you have any structural differences between the ASTs you see when

you first evaluate your expression at breakpoint A
you evaluate your expression the second time at breakpoint A ?
The AST of the expression evaluator’s context is dumped once a TagDecl is completed, but you need to enable a specific logging: (log enable lldb ast).

I have a theory about the root cause of the slowness you experience, but it requires proof from your test scenario.
There are known problems with the lookup we use in LLDB [1].
If the lookup mechanism cannot find a symbol then clang::ASTImporter will create a new AST node.
Thus I am expecting a growing AST in your case at 2) with duplicated and redundant AST nodes.
Why would the grown AST results in any slowness?
Because the lookup we use is in clang::DeclContext::localUncachedLookup() and it has a fallback branch which I assume is executed in your case:

// Slow case: grovel through the declarations in our chain looking for
// matches.
// FIXME: If we have lazy external declarations, this will not find them!
// FIXME: Should we CollectAllContexts and walk them all here?
for (Decl *D = FirstDecl; D; D = D->getNextDeclInContext()) {
if (auto *ND = dyn_cast<NamedDecl>(D))
if (ND->getDeclName() == Name)
Results.push_back(ND);
}

This for loop does a linear search on the list of the decls of a decl context, which explains the slowing factor as the AST grows.
I wounder if you could help proving this theorem by first checking if the AST grows and if yes then checking if this linear search is executed or not.

[1] If a symbol is in an extern "C" block then the existing lookup fails to find it. I try to fix it in https://reviews.llvm.org/D61333

Thanks,
Gabor

Guilherme_Andrade · July 15, 2019, 5:53pm

Gábor,
Thanks for pointing this out to me. The AST changes - the resulting log increases from 7k lines to 11k. I also verified that the fallback branch is executed. 18k iterations during the first evaluation and 93k afterwards. However, that only results in a couple extra milliseconds slowness (~ 4 ms), whereas the overall performance hit is in the order of hundreds.

Jim,
Thank you for the explanation. I think I understand the lazy approach for types realization, but it is still not clear to me how that could cause the performance to degrade. If we encounter an already realized type, won’t that save us work and make things run faster? Do you know of other points in the code that could be particularly sensitive to the realized types pool size (something like the branch Gábor mentioned)?

Scott_Funkenhauser · August 22, 2019, 2:31pm

Hey Jim,

We just noticed that ‘target.experimental.inject-local-vars’ is true by default. If we disable that experimental the performance for expression evaluation is significantly better.

From the flag description:
“If true, inject local variables explicitly into the expression text. This will fix symbol resolution when there are name collisions between ivars and local variables. But it can make expressions run much more slowly.”

I put together a simple example:
class bar {
public:
int foo = 2;
int Run(int foo) {
return foo + 1;
}
};

Evaluating ‘foo’ when stopped in bar.Run() seems to work as expected. Is there something I’m not capturing in my example? Do you have an example of a name collision that the experimental flag fixes?

Teemperor · August 22, 2019, 3:37pm

The test lldb/packages/Python/lldbsuite/test/lang/cpp/member-and-local-vars-with-same-name/main.cpp
is testing this feature, so you should get a 10 (instead of a correct
12345) when you break in main.cpp:31 in this test and eval "expr a"
while you have this feature disabled. At least for me that's the case.

- Raphael

Topic		Replies	Views
Breakpoint + callback performance ... Can it be faster? LLDB	5	90	August 17, 2016
Breakpoint + callback performance ... Can it be faster? LLDB	8	79	February 13, 2017
[Bug 22315] New: Breakpoint does not hit when evaluating a function call expression LLDB	1	82	January 23, 2015
[RFC] Fast Conditional Breakpoints (FCB) LLDB	23	151	August 23, 2019
[Bug 33164] New: Support conditional breakpoints during expression evaluation LLDB	0	82	May 25, 2017

Evaluating the same expression at the same breakpoint gets slower after a certain number of steps

Related Topics