Tablegen binary literals

Hi all

Currently tablegen parses binary literals such as 0b011 and immediately turns them in to integers internally. Due to this, 0b011 is a 2-bit value because tablegen will happily drop leading zeroes on integers.

I propose that we change this, and store binary literals with a size. I think this is much more natural, as when the user writes a value with 3 bits, I think they expect to get a value with 3 bits, and not a 2-bit value which is silently truncated and later silently zero extended.

Given this, I would then like to extend bits initializers to accept these new sized binary literals. For example, I would like to be able to write

bits<2> opc;
bits<8> x = { opc, 0b1100, rd{1}, rs{0} }

This would let us write some encodings much more concisely, instead of having to put each initializer on its own line.

I do have patches, which I’m happy to present for review, but I’d like some feedback on the general idea before we get to the low level details.

Thanks,
Pete

Hi all

Currently tablegen parses binary literals such as 0b011 and immediately turns them in to integers internally. Due to this, 0b011 is a 2-bit value because tablegen will happily drop leading zeroes on integers.

I propose that we change this, and store binary literals with a size. I think this is much more natural, as when the user writes a value with 3 bits, I think they expect to get a value with 3 bits, and not a 2-bit value which is silently truncated and later silently zero extended.

Given this, I would then like to extend bits initializers to accept these new sized binary literals. For example, I would like to be able to write

bits<2> opc;
bits<8> x = { opc, 0b1100, rd{1}, rs{0} }

This would let us write some encodings much more concisely, instead of having to put each initializer on its own line.

Hi Pete,

Just to clarify, are you proposing two things here? First, 0b… literals to have type bits<n> and second to allow bits<n> initializer to contain other bits<m> elements which would initialize the next m elements.

I.e. I don’t think we currently accept:

bits<4> x = { opc, opc }

Thanks,
Adam

Hi Adam

Hi Pete,

Just to clarify, are you proposing two things here? First, 0b… literals to have type bits and second to allow bits initializer to contain other bits elements which would initialize the next m elements.

Yeah, exactly those 2 things. I have them in separate patches, but I think we only get the benefit from sized binary literals if we also allow them to initialize multiple bits in another bits type.

Thanks,
Pete

Hi Adam

Hi Pete,

Just to clarify, are you proposing two things here? First, 0b… literals to have type bits and second to allow bits initializer to contain other bits elements which would initialize the next m elements.

Yeah, exactly those 2 things. I have them in separate patches, but I think we only get the benefit from sized binary literals if we also allow them to initialize multiple bits in another bits type.

Looks like bits is already valid in a bits initializer context; it yields the bottom bit.

def a {
bits<2> opc = { 0, 1 };
bits<2> opc2 = { 1, 0 };
bits<2> oo = { opc, opc2 };
}

is valid and produces:


bits<2> oo = { 1, 0 };

Are you aware of this? This may lead to some ambiguity with your proposed extension. (I have no idea whether this behavior is relied on anywhere though.)

Adam

Yeah, exactly those 2 things. I have them in separate patches, but I think
we only get the benefit from sized binary literals if we also allow them to
initialize multiple bits in another bits<n> type.

It also allows type checking for single initializers. I've been caught
out a couple of times when I thought I'd given 17 digits in "let
Inst{16-0} = 0b1001000111011010" or similar.

Cheers.

Tim.

Hi Adam

Hi Pete,

Just to clarify, are you proposing two things here? First, 0b… literals to have type bits and second to allow bits initializer to contain other bits elements which would initialize the next m elements.

Yeah, exactly those 2 things. I have them in separate patches, but I think we only get the benefit from sized binary literals if we also allow them to initialize multiple bits in another bits type.

Looks like bits is already valid in a bits initializer context; it yields the bottom bit.

def a {
bits<2> opc = { 0, 1 };
bits<2> opc2 = { 1, 0 };
bits<2> oo = { opc, opc2 };
}

is valid and produces:


bits<2> oo = { 1, 0 };

Are you aware of this? This may lead to some ambiguity with your proposed extension. (I have no idea whether this behavior is relied on anywhere though.)

I wasn’t. This is horrendous. I traced through the code and it turns out we’ll happily just drop all but the lowest bit in these cases.

I’m going to see what happens if I disallow this, unless of course the variable is a single bit in length.

Thanks for the great example!
Pete

Hi Adam

Hi Pete,

Just to clarify, are you proposing two things here? First, 0b… literals
to have type bits<n> and second to allow bits<n> initializer to contain
other bits<m> elements which would initialize the next m elements.

Yeah, exactly those 2 things. I have them in separate patches, but I
think we only get the benefit from sized binary literals if we also allow
them to initialize multiple bits in another bits<n> type.

Looks like bits<n> is already valid in a bits initializer context; it
yields the bottom bit.

def a {
  bits<2> opc = { 0, 1 };
  bits<2> opc2 = { 1, 0 };
  bits<2> oo = { opc, opc2 };
}

is valid and produces:

..
  bits<2> oo = { 1, 0 };
..

Are you aware of this? This may lead to some ambiguity with your proposed
extension. (I have no idea whether this behavior is relied on anywhere
though.)

I think it would make sense to change this behavior to follow the behavior
that Pete describes. I don't have the code handy but my guess is that what
the code is doing here is requesting each element in the braces of a
bits<n> initializer to be casted to `bit`; in that case, we would want to
change that code to first try casting to bits and give that preference (and
issues relevant diagnostics). Pete, is that what your patch does? Can you
attach your patches?

Of course, before changing the behavior, we would probably want to place an
assert on that codepath to catch all existing cases where the behavior
would change.
Also it is important to scan all the docs in docs/TableGen/ and make any
necessary updates (if there is nowhere that needs to be updated, then
documentation needs to be *added* covering this behavior).
Finally, we should post a notice to LLVMdev for out-of-tree maintainers
including information about how to add the necessary assert to check for
places where this behavior would change their .td files.

-- Sean Silva

Yeah, thats exactly it. Here’s a WIP patch which fixes that particular issue. I’m building the rest of the compiler now to see if anything breaks with that applied.

var-bits.diff (1.2 KB)

0001-Add-a-new-BinaryIntInit-class-to-tablegen.patch (13.5 KB)

0002-Change-the-expression-in-tablegen-to-accept-sized-bi.patch (3.46 KB)

I think it would make sense to change this behavior to follow the behavior
that Pete describes. I don't have the code handy but my guess is that what
the code is doing here is requesting each element in the braces of a
bits<n> initializer to be casted to `bit`; in that case, we would want to
change that code to first try casting to bits and give that preference (and
issues relevant diagnostics). Pete, is that what your patch does? Can you
attach your patches?

Yeah, thats exactly it. Here’s a WIP patch which fixes that particular
issue. I’m building the rest of the compiler now to see if anything breaks
with that applied.

Of course, before changing the behavior, we would probably want to place
an assert on that codepath to catch all existing cases where the behavior
would change.

Sounds like a good plan. I’ll try that and see what happens.

Also it is important to scan all the docs in docs/TableGen/ and make any
necessary updates (if there is nowhere that needs to be updated, then
documentation needs to be *added* covering this behavior).

I haven’t done that yet, but good point. Depending on what changes you
all think are acceptable, i’ll update the documentation appropriately.

Finally, we should post a notice to LLVMdev for out-of-tree maintainers
including information about how to add the necessary assert to check for
places where this behavior would change their .td files.

Sure thing.

And here’s the other 2 patches. The first sizes binary literals.

Regarding this patch, does it need to create a whole new Init class? I
would try to avoid that if at all possible. Could we just have 0b0111 be a
shorthand for a bits<4> BitsInit? Or maybe store some extra data in IntInit?

The second allows them to initialize multiple bits inside { }. Note that
i’m only handling certain cases inside the { } initializer. I’m happy to
refactor the code and handle other cases if necessary.

I don't like the special casing going on in this patch. Could we just
switch BitsInit to store pointers to *both* BitInit and BitsInit (it
actually can already sort of do that since it has a std::vector<Init*>)?

I just feel like this is adding a lot of complexity to TableGen where the
behavior that you want could largely be accomplished by removing complexity.

-- Sean Silva

I was worried that changing IntInit to store a size would have be too much of a change to existing behavior. Perhaps thats wrong, as I only really need to check the size in a few limited circumstances.

Yeah, I think this could be improved. How about instead of

Init *Bit = Vals[i]->convertInitializerTo(BitRecTy::get());

I do

Init *Bit = Vals[i]->convertInitializerTo(BitsRecTy::get());

Or perhaps attempt the Bits case first, and Bit as a fallback, or vice versa?

The problem here is that I don’t necessarily want to allow { } to accept something like ‘-7’ because now I need to decide how many bits that is. So i don’t want to allow anything to cast to Bits, but instead just anything with an exact size. I’ll investigate the scope of this change and see what actually happens. It might turn out ok.

Thanks for the feedback.
Pete

I think it would make sense to change this behavior to follow the
behavior that Pete describes. I don't have the code handy but my guess is
that what the code is doing here is requesting each element in the braces
of a bits<n> initializer to be casted to `bit`; in that case, we would want
to change that code to first try casting to bits and give that preference
(and issues relevant diagnostics). Pete, is that what your patch does? Can
you attach your patches?

Yeah, thats exactly it. Here’s a WIP patch which fixes that particular
issue. I’m building the rest of the compiler now to see if anything breaks
with that applied.

Of course, before changing the behavior, we would probably want to place
an assert on that codepath to catch all existing cases where the behavior
would change.

Sounds like a good plan. I’ll try that and see what happens.

Also it is important to scan all the docs in docs/TableGen/ and make any
necessary updates (if there is nowhere that needs to be updated, then
documentation needs to be *added* covering this behavior).

I haven’t done that yet, but good point. Depending on what changes you
all think are acceptable, i’ll update the documentation appropriately.

Finally, we should post a notice to LLVMdev for out-of-tree maintainers
including information about how to add the necessary assert to check for
places where this behavior would change their .td files.

Sure thing.

And here’s the other 2 patches. The first sizes binary literals.

Regarding this patch, does it need to create a whole new Init class? I
would try to avoid that if at all possible. Could we just have 0b0111 be a
shorthand for a bits<4> BitsInit? Or maybe store some extra data in IntInit?

I was worried that changing IntInit to store a size would have be too much
of a change to existing behavior. Perhaps thats wrong, as I only really
need to check the size in a few limited circumstances.

I'm personally much more in favor of the first option I suggested (0b0111
should be a bits<4>).

The second allows them to initialize multiple bits inside { }. Note
that i’m only handling certain cases inside the { } initializer. I’m happy
to refactor the code and handle other cases if necessary.

I don't like the special casing going on in this patch. Could we just
switch BitsInit to store pointers to *both* BitInit and BitsInit (it
actually can already sort of do that since it has a std::vector<Init*>)?

Yeah, I think this could be improved. How about instead of

Init *Bit = Vals[i]->convertInitializerTo(BitRecTy::get());

I do

Init *Bit = Vals[i]->convertInitializerTo(BitsRecTy::get());

Or perhaps attempt the Bits case first, and Bit as a fallback, or vice
versa?

Would the order of these checks cause any behavioral difference (besides
one of the orders failing to produce the desired behavior)? I would hope
not.

The problem here is that I don’t necessarily want to allow { } to accept
something like ‘-7’ because now I need to decide how many bits that is.

Well, try placing a printf in `Init *BitsRecTy::convertValue(IntInit *II)`
and see what you find. If it seems like nothing is crucially depending on
this behavior, then it would probably make sense to completely eliminate
this conversion.

-- Sean Silva

> Yeah, exactly those 2 things. I have them in separate patches, but I
think
> we only get the benefit from sized binary literals if we also allow them
to
> initialize multiple bits in another bits<n> type.

It also allows type checking for single initializers. I've been caught
out a couple of times when I thought I'd given 17 digits in "let
Inst{16-0} = 0b1001000111011010" or similar.

Allowing underscores in the literal seems like it would help alleviate this
somewhat (do we already allow that?). That is what VHDL does ("Underlines
can be used to increase readability and have no impact on the value.").

-- Sean Silva

Yeah, _’s sounds like a good idea. I was going to do that independently of all this other work if there’s no objections.

Pete

> Yeah, exactly those 2 things. I have them in separate patches, but I
think
> we only get the benefit from sized binary literals if we also allow
them to
> initialize multiple bits in another bits<n> type.

It also allows type checking for single initializers. I've been caught
out a couple of times when I thought I'd given 17 digits in "let
Inst{16-0} = 0b1001000111011010" or similar.

Allowing underscores in the literal seems like it would help alleviate
this somewhat (do we already allow that?). That is what VHDL does
("Underlines can be used to increase readability and have no impact on the
value.").

Yeah, _’s sounds like a good idea. I was going to do that independently
of all this other work if there’s no objections.

Go for it. Seems like a no-brainer (remember to update the docs in relevant
places; we want people to use this feature!).

-- Sean Silva

> Yeah, exactly those 2 things. I have them in separate patches, but I
> think
> we only get the benefit from sized binary literals if we also allow them
> to
> initialize multiple bits in another bits<n> type.

It also allows type checking for single initializers. I've been caught
out a couple of times when I thought I'd given 17 digits in "let
Inst{16-0} = 0b1001000111011010" or similar.

Allowing underscores in the literal seems like it would help alleviate this
somewhat (do we already allow that?). That is what VHDL does ("Underlines
can be used to increase readability and have no impact on the value.").

Yeah, _’s sounds like a good idea. I was going to do that independently of
all this other work if there’s no objections.

Sounds like a solid idea.

Thanks!

-eric