Target data question

According to the "LLVM Assembly Language Reference Manual":

When constructing the data layout for a given target, LLVM starts with
a default set of specifications which are then (possibly) overriden by
the specifications in the datalayout keyword. The default
specifications are given in this list:

    * E - big endian
    * p:32:64:64 - 32-bit pointers with 64-bit alignment

Are these the specifications that are assumed by LLVM tools such as
"opt" when a module doesn't have a target data specification? And
does that mean that "opt", when given a module without a target data
specification, might assume that GEP pointer-to-pointer, 1 should
increment that pointer by eight bytes?

Unfortunately, yes. See PR4542. Progress has been made recently
though -- the optimizers are now ready. The main things left to do
is to update the documentation and update the testsuite to account
for the change in the meaning of a module without a targetdata string.

Dan

If anyone else was waiting for the answer...

The answer is yes. "opt" uses the same default target data layout
regardless of what host machine it's being run on, and it'll trash any
pointer indexing you're trying to do if you're running on a real
machine whose target data layout is different from the universal
default and your module doesn't have a target data layout specified.

I hardcoded the layout by copying-and-pasting one generated by
llvm-gcc. Now I'm off to dig up how to get the layout string for the
platform you're running on...

So in the near future, the optimizers won't do any target-specific
transformations in the absence of module target data?

Also, has anyone passed a target data string to "opt" with
-targetdata? I'm trying that out now and getting "Too many positional
arguments specified!". I've tried escaping all the dashes in the
target data string... no luck

As near as when someone steps up to do the work :-). I'm not
actively working on this myself right now.

Dan

It's broken. It's probably easily fixable if you have a good
idea of what the user interface should be.

Dan

I uncovered a few cases where optimizers still crash when the target
data pass isn't registered, and I'll send patches as I can.

If the TargetData pass isn't registered in the global registry,
getPassInfo() returns null.

Now when you add a TargetData pass, it winds up in ImmutablePasses.
Any search through ImmutablePasses assumes that getPassInfo() for
every member returns something other than null. So findAnalysisPass
for *any* analysis pass can crash the system if the TargetData pass is
lurking in the list without being registered.

Since we want to be able to run opt without a TargetData pass, this
will never do. If TargetData is registered globally, any
findAnalysisPass call will find it if there isn't another TargetData
pass in the PassManager. Should TargetData now not be considered an
ImmutablePass? Should findAnalysisPass include a null check on the
getPassInfo of ImmutablePasses?

Never mind, I got confused. Registering a pass doesn't mean that
getAnalysisIfAvailable will return it; it still has to be in the pass
manager's collection. It just means that PassInfo will be available
for it if it's there.

I think...

Anyway, my present plan of attack is to have a "-defaulttarget" option
with "none", "host", or a target string. If -defaulttarget is not
specified, the behavior of "opt" will be the same as it is presently.
The defaulttarget will be overridden by the Module's target data if it
has some. "none" means that no TargetData pass will be added unless
the Module supplies target data. "host" uses the running host's
TargetData as the default.

What do y'all think?

If the TargetData pass isn't registered in the global registry,
getPassInfo() returns null.

Now when you add a TargetData pass, it winds up in ImmutablePasses.
Any search through ImmutablePasses assumes that getPassInfo() for
every member returns something other than null. So findAnalysisPass
for *any* analysis pass can crash the system if the TargetData pass is
lurking in the list without being registered.

Since we want to be able to run opt without a TargetData pass, this
will never do. If TargetData is registered globally, any
findAnalysisPass call will find it if there isn't another TargetData
pass in the PassManager. Should TargetData now not be considered an
ImmutablePass? Should findAnalysisPass include a null check on the
getPassInfo of ImmutablePasses?

Never mind, I got confused. Registering a pass doesn't mean that
getAnalysisIfAvailable will return it; it still has to be in the pass
manager's collection. It just means that PassInfo will be available
for it if it's there.

I think...

Yes, that's what's intended.

Anyway, my present plan of attack is to have a "-defaulttarget" option
with "none", "host", or a target string. If -defaulttarget is not
specified, the behavior of "opt" will be the same as it is presently.
The defaulttarget will be overridden by the Module's target data if it
has some. "none" means that no TargetData pass will be added unless
the Module supplies target data. "host" uses the running host's
TargetData as the default.

What do y'all think?

I think it's more intuitive to have command-line information override
Module information. That's how llc works, for example.

Also, is the argument to -defaulttarget a triple, an architecture name,
or a targetdata string? If it's a triple, it'd be nice to be consistent
with llc and call it -mtriple=. For an architecture name, -march=.
If it's a targetdata string, perhaps -targetdata= would be a good name.

(As an aside, I wouldn't object to having llc's options renamed to
remove the leading 'm', as that seems to have been intended to follow
GCC's targeting options, and they aren't the same.)

Dan

Anyway, my present plan of attack is to have a "-defaulttarget" option
with "none", "host", or a target string. If -defaulttarget is not
specified, the behavior of "opt" will be the same as it is presently.
The defaulttarget will be overridden by the Module's target data if it
has some. "none" means that no TargetData pass will be added unless
the Module supplies target data. "host" uses the running host's
TargetData as the default.

What do y'all think?

I think it's more intuitive to have command-line information override
Module information. That's how llc works, for example.

Also, is the argument to -defaulttarget a triple, an architecture name,
or a targetdata string? If it's a triple, it'd be nice to be consistent
with llc and call it -mtriple=. For an architecture name, -march=.
If it's a targetdata string, perhaps -targetdata= would be a good name.

(As an aside, I wouldn't object to having llc's options renamed to
remove the leading 'm', as that seems to have been intended to follow
GCC's targeting options, and they aren't the same.)

Dan

The argument to -default-data-layout is a targetdata string.
-no-default-data-layout means that no TargetData pass is added unless
the module supplies a target data string.

llvm-gcc always inserts targetdata. I'm wondering if the code it
generates somehow depends on the assumption that 'opt' is taking its
target data into account. As in, some of it uses absolute offsets and
some of it uses pointer-indexing that gets affected by the targetdata.
Anyway, it seemed safer to take the module's targetdata if it was
built with targetdata included

Note to self: wait at least 24 hours after soliciting feedback before
sending a patch.

Anyway, after thinking about it, it should always be safe to override
the Module to remove the target data pass, even if it isn't safe to
override the Module to substitute different target data. But I still
think you should at least have the option to supply target data
*without* overriding whatever comes in from the module.

An even better question is: does it *ever* make sense to supply a
blanket default target data striing? If no target-data option is
supplied, wouldn't it be better to default to the target data for the
running host? Or would that break existing code and/or tests?

I think it's more intuitive to have command-line information override
Module information. That's how llc works, for example.

Also, is the argument to -defaulttarget a triple, an architecture name,
or a targetdata string? If it's a triple, it'd be nice to be consistent
with llc and call it -mtriple=. For an architecture name, -march=.
If it's a targetdata string, perhaps -targetdata= would be a good name.

(As an aside, I wouldn't object to having llc's options renamed to
remove the leading 'm', as that seems to have been intended to follow
GCC's targeting options, and they aren't the same.)

Dan

The argument to -default-data-layout is a targetdata string.
-no-default-data-layout means that no TargetData pass is added unless
the module supplies a target data string.

llvm-gcc always inserts targetdata. I'm wondering if the code it
generates somehow depends on the assumption that 'opt' is taking its
target data into account. As in, some of it uses absolute offsets and
some of it uses pointer-indexing that gets affected by the targetdata.
Anyway, it seemed safer to take the module's targetdata if it was
built with targetdata included

Note to self: wait at least 24 hours after soliciting feedback before
sending a patch.

Anyway, after thinking about it, it should always be safe to override
the Module to remove the target data pass, even if it isn't safe to
override the Module to substitute different target data. But I still
think you should at least have the option to supply target data
*without* overriding whatever comes in from the module.

In what situations would this be useful?

Would it make sense to have opt issue an error if the module and the
command-line have incompatible non-empty strings?

An even better question is: does it *ever* make sense to supply a
blanket default target data striing? If no target-data option is
supplied, wouldn't it be better to default to the target data for the
running host? Or would that break existing code and/or tests?

It would break existing tests, which currently rely on an empty
targetdata string being interpreted as historical sparc settings.
But tests can be updated; it's more important to figure out how
to make opt useful first.

Dan

Anyway, after thinking about it, it should always be safe to override
the Module to remove the target data pass, even if it isn't safe to
override the Module to substitute different target data. But I still
think you should at least have the option to supply target data
*without* overriding whatever comes in from the module.

In what situations would this be useful?

It is definitely useful to me to tell opt "don't use a target data
string unless the module has one"... then I can feed it modules built
without a target platform, or modules built with the target platform
I'm using, and it'll do the best thing in both cases and break
neither. "Use this string unless the module has a different one"...
the module would only have a different one if it was compiled by
llvm-gcc or some other front-end specifically targeting some other
platform... and modules specifically targeting different platforms
probably won't get thrown together in the same build, folder, or
process. So that may not be as useful as I thought.

In short, I definitely want to keep "no-default-data-layout", but not
necessarily "default-data-layout".

Would it make sense to have opt issue an error if the module and the
command-line have incompatible non-empty strings?

Yes, it would, come to think of it. If llvm-gcc put a target data
string, it generated code under the assumption that 'opt' would not
use a different target data string. Breaking that assumption could be
very bad. (Not having one at all should be safe, since that would
keep opt from messing with existing pointer logic).

It would break existing tests, which currently rely on an empty
targetdata string being interpreted as historical sparc settings.
But tests can be updated; it's more important to figure out how
to make opt useful first.

I was hesitating because fixing the tests and changing opt would have
to be done together in one shot, or else the tests would be broken for
a while. And that seemed like a daunting task.

I think "no-default-data-layout" all by itself gets us where we need
to be to have opt be useful in just about every case I can think of.
And if production code with pointer operations works with an optimizer
that targets a completely different platform and messes with it's
pointer operations, it's probably very close to breaking mysteriously
at any rate. So having "no-default-data-layout" be the default would
make production code *more* stable and mainly break those tests, as
far as I can see. How many tests are we talking, and what's the best
way to attack that problem?

And a force-data-layout with an error if the module has a different
one... that option would find its way into my makefiles as the last
step after an application is linked with the target-independent
standard libraries and is ready to be optimized in a single unit,
whole-program style, with optimizations for the install platform
included without having to make my compiler deal with it. And if
modules targeting a different platform find their way into my build
process, I definitely want to bring the build to a screeching halt.

I was hesitating because fixing the tests and changing opt would have
to be done together in one shot, or else the tests would be broken for
a while. And that seemed like a daunting task.

I think "no-default-data-layout" all by itself gets us where we need
to be to have opt be useful in just about every case I can think of.
And if production code with pointer operations works with an optimizer
that targets a completely different platform and messes with it's
pointer operations, it's probably very close to breaking mysteriously
at any rate. So having "no-default-data-layout" be the default would
make production code *more* stable and mainly break those tests, as
far as I can see. How many tests are we talking, and what's the best
way to attack that problem?

I see two approaches so far:

1. Change the tests one by one, having them use the
"no-default-data-layout" flag as they're updated and checked in, then
make "no-default-data-layout" the default when they're all done.

2. Change all the tests to use "default-data-layout" with the Sparc
setting. Then make "no-default-data-layout" the default. Update the
tests one-by-one, taking off the "default-data-layout" setting as
they're updated and checked in. Remove the "default-data-layout"
setting when they're all done, unless someone thinks of a use for it.

If I'm going to attack this, I'll need to get the tests working on my
system first. (I need to do that anyway to work on the struct-return
thing on my list)

Anyway, after thinking about it, it should always be safe to override
the Module to remove the target data pass, even if it isn't safe to
override the Module to substitute different target data. But I still
think you should at least have the option to supply target data
*without* overriding whatever comes in from the module.

In what situations would this be useful?

It is definitely useful to me to tell opt "don't use a target data
string unless the module has one"... then I can feed it modules built
without a target platform, or modules built with the target platform
I'm using, and it'll do the best thing in both cases and break
neither. "Use this string unless the module has a different one"...
the module would only have a different one if it was compiled by
llvm-gcc or some other front-end specifically targeting some other
platform... and modules specifically targeting different platforms
probably won't get thrown together in the same build, folder, or
process. So that may not be as useful as I thought.

In short, I definitely want to keep "no-default-data-layout", but not
necessarily "default-data-layout".

Makes sense.

Would it make sense to have opt issue an error if the module and the
command-line have incompatible non-empty strings?

Yes, it would, come to think of it. If llvm-gcc put a target data
string, it generated code under the assumption that 'opt' would not
use a different target data string. Breaking that assumption could be
very bad. (Not having one at all should be safe, since that would
keep opt from messing with existing pointer logic).

It would break existing tests, which currently rely on an empty
targetdata string being interpreted as historical sparc settings.
But tests can be updated; it's more important to figure out how
to make opt useful first.

I was hesitating because fixing the tests and changing opt would have
to be done together in one shot, or else the tests would be broken for
a while. And that seemed like a daunting task.

I think "no-default-data-layout" all by itself gets us where we need
to be to have opt be useful in just about every case I can think of.
And if production code with pointer operations works with an optimizer
that targets a completely different platform and messes with it's
pointer operations, it's probably very close to breaking mysteriously
at any rate. So having "no-default-data-layout" be the default would
make production code *more* stable and mainly break those tests, as
far as I can see. How many tests are we talking, and what's the best
way to attack that problem?

I don't know offhand, though I don't think it's an overwhelming
number. I'd suggest just trying it.

And a force-data-layout with an error if the module has a different
one... that option would find its way into my makefiles as the last
step after an application is linked with the target-independent
standard libraries and is ready to be optimized in a single unit,
whole-program style, with optimizations for the install platform
included without having to make my compiler deal with it. And if
modules targeting a different platform find their way into my build
process, I definitely want to bring the build to a screeching halt.

Ok. I don't currently have a need for this functionality so I'll
leave it up to you. The main counter-intuitive thing for me is
any option which gets silently ignored when it conflicts with
what's in the module.

Dan