Tablegen: RegisterInfoEmitter.cpp

Hi,

I just bumped into a bug in this code. The problem was as follows:

I have defined a set of registers with rather similar names including digits.

The code section at

RegisterInfoEmitter::run(){

// Process sub-register sets.

runs and fills the RegisterAliases map.

then,


for (unsigned i = 0, e = Regs.size(); i != e; ++i) {
RegNo[Regs[i].TheDef] = i;
NumAliases += RegisterAliases[Regs[i].TheDef].size();
}

runs. Only, now there are duplicates in the RegisterAliases map for the same Regs[i]-Record.

This lead to duplicate output of the REG_Overlaps lists:

error: redefinition of ‘const unsigned int llvm::::a23g_Overlaps []’
/local/scratch/ejonpan/llvm/build/lib/Target/Hubble/HubbleGenRegisterInfo.inc:2700: error: ‘const unsigned int llvm::::a23g_Overlaps [4]’
previously defined here
/local/scratch/ejonpan/llvm/build/lib/Target/Hubble/HubbleGenRegisterInfo.inc:2732: error: redefinition of ‘const unsigned int llvm::::a6
7g_Overlaps []’
/local/scratch/ejonpan/llvm/build/lib/Target/Hubble/HubbleGenRegisterInfo.inc:2717: error: ‘const unsigned int llvm::::a67g_Overlaps [4]’
previously defined here

This behaved very oddly, as the error disappeared after changing register names from eg a23g to aa23g.

It was the map ordering operator that was the trouble, it seems, as the problem disappeared when I used std::string::compare() instead for
the RegisterAliases map.

struct LessRecord {
bool operator()(const Record *Rec1, const Record *Rec2) const {
return StringRef(Rec1->getName()).compare_numeric(Rec2->getName()) < 0;
}
};

struct LessRecordRegAliases{
bool operator()(const Record *Rec1, const Record *Rec2) const {
return Rec1->getName().compare(Rec2->getName()) < 0;
}
};

The conclusion is that the StringRef::compare_numeric() is not deterministic and should not be used in this context, as the map::operator[] will not find the
object, and insert a duplicate in this case. Note that my reg-names included digits and where very similar.

/Jonas

Thanks for tracking this down.

I believe we have a bug in compare_numeric() causing it to be non-transitive sometimes. It is supposed to provide a total ordering of strings. Can you find the bug?

/jakob

Fixed in r140859.

Note that it wasn’t non-deterministic, just not transitive.

/jakob

Hi,

I understand the idea behind compare_numeric() is to compare strings containing digits in a special way: Do a normal string-compare up to the point where both string elemnts are numerical. Find then an outcome based on the number of consecutive digits in the strings while disregarding the value of the digits, eg a12b < a123.

I guess then this order should hold: a12 == a22 < a1b, for these strings.

Looking at StringRef::compare_numeric(StringRef RHS), the problem is, for a12 and a1b, Data[1]==RHS.Data[1], and the continue is reached. Data[2] is then less than RHS.Data[2], so a12 < a1b. But in the case for a22 and a1b, we get the opposite, since ‘2’!=‘1’, and 22 is more digits than 1. So we get a12 < a1b < a22, which is incorrect, because a12==a22.

My problem was with these registers: a23g, a2g and a3g. When I renamed a23g to aa23g, it worked. Since ‘2’==‘2’, the problem was that a23g<a2g.

I think the fix is to first check for two digits, and move the equality comparison to after this. Then a2g < a3g < a23g:

/// compare_numeric - Compare strings, handle embedded numbers.
int StringRef::compare_numeric(StringRef RHS) const {
for (size_t I = 0, E = min(Length, RHS.Length); I != E; ++I) {
if (ascii_isdigit(Data[I]) && ascii_isdigit(RHS.Data[I])) {
// The longer sequence of numbers is larger. This doesn’t really handle
// prefixed zeros well.
for (size_t J = I+1; J != E+1; ++J) {
bool ld = J < Length && ascii_isdigit(Data[J]);
bool rd = J < RHS.Length && ascii_isdigit(RHS.Data[J]);
if (ld != rd)
return rd ? -1 : 1;
if (!rd)
break;
}
}
if (Data[I] == RHS.Data[I])
continue;
return (unsigned char)Data[I] < (unsigned char)RHS.Data[I] ? -1 : 1;
}
if (Length == RHS.Length)
return 0;
return Length < RHS.Length ? -1 : 1;
}

patch (2.9):
/lib/Support/StringRef.cpp:
48a49,50

if (Data[I] == RHS.Data[I])
continue;
61,62d62
< if (Data[I] == RHS.Data[I])
< continue;

I ran this, and now I have no problems, and the other targets built as well, at least, without problems. I have not run the testsuite.

Jonas

I understand the idea behind compare_numeric() is to compare strings containing digits in a special way: Do a normal string-compare up to the point where both string elemnts are numerical. Find then an outcome based on the number of consecutive digits in the strings while disregarding the value of the digits, eg a12b < a123.

Almost. It is trying to compare embedded sequences of digits as numbers. It does this by first comparing the length of the digit sequence, and then a lexicographical comparison of equal length digit sequences.

I guess then this order should hold: a12 == a22 < a1b, for these strings.

No, a1b < a12 < a22. “a1b” is smaller than “a12” because it has fewer digits. “a12” has the same number of digits as “a22”, but it is lexicographically smaller.

Looking at StringRef::compare_numeric(StringRef RHS), the problem is, for a12 and a1b, Data[1]==RHS.Data[1], and the continue is reached. Data[2] is then less than RHS.Data[2], so a12 < a1b. But in the case for a22 and a1b, we get the opposite, since ‘2’!=‘1’, and 22 is more digits than 1. So we get a12 < a1b < a22, which is incorrect, because a12==a22.

My problem was with these registers: a23g, a2g and a3g. When I renamed a23g to aa23g, it worked. Since ‘2’==‘2’, the problem was that a23g<a2g.

I think the fix is to first check for two digits, and move the equality comparison to after this. Then a2g < a3g < a23g:

/lib/Support/StringRef.cpp:

48a49,50

if (Data[I] == RHS.Data[I])
continue;
61,62d62
< if (Data[I] == RHS.Data[I])
< continue;

I ran this, and now I have no problems, and the other targets built as well, at least, without problems. I have not run the testsuite.

Your fix is correct, but there is one problem: The algorithm is now quadratic in the length of embedded digit sequences.

This is easy to fix, see r140859.

Thanks!

/jakob