A "backend" is ... ?

I haven't run out of popcorn watching the "Upstreaming PNaCl's
IR simplification passes" thread but one thing struck me that
I would really like clarified. It feels like a tangential
issue so I thought I'd ask it separately.

In the very same message, I see this:

Some background: There are two related use cases for these
IR simplification passes:

1) Simplifying the task of writing a new LLVM backend. This
is Emscripten's use case. The IR simplification passes reduce
the number of cases a backend has to handle, so they would be
useful for anyone else creating a new backend.

If these simplify writing a backend, why wouldn't the patches
include commensurate simplifications to LLVM's backends? That
would both give them an in-tree customer, and more immediate
value to the community and project as a whole.

I'd also like to add:
If these simplify writing a backend, should there be commensurate
changes to any relevant documentation for getting started writing
backends? (we don't have much such documentation though...)

Very much so, yes.

And this...

* Calling conventions lowering: ExpandVarArgs and ExpandByVal
lower varargs and by-value argument passing respectively.
They would be useful for any backend that doesn't want to
implement varargs or by-value calling conventions.

Why wouldn't these be applicable to existing backends? What is
hard about the existing representations?

And this...

\* PromoteIntegers legalizes integer types \(e\.g\. i30 is

converted to i32).

Does it split up too-wide integers? Do we really want another
integer legalization framework in LLVM? I am actually interested
in doing (partial) legalization in the IR during lowering
(codegenprep time) in order to simplify the backend, but I don't
think we should develop such a framework independently of the
legalization currently used in the backends.

...all of which clearly presume multiple backends exist;
and yet I then see this:

* Module-level lowering: This implements, at the IR level,
functionality that is traditionally provided by "ld". e.g.
ExpandCtors lowers llvm.global_ctors to the __init_array_start
and __init_array_end symbols that are used by C libraries at
startup.

This doesn't make any sense to me. The IR representation is
strictly simpler. It is trivially lowered in a backend. I don't
understand what this would benefit.

It might be simpler to do in the backend, but I think that the
point is that it is a recurring cost in every backend; in
particular for backends written by people starting out/playing
around with LLVM (i.e. potential future contributors), where
any potential performance loss is acceptable for the sake of
simplifying things.

I don't understand this at all.

We have a *target independent* backend. There is only one, so
there should be no recurring cost.

So we have lots of backends, conjuring up new ones is common
enough that we desire better documentation for doing it, the
goal of simplifying these backends is moderately worthy, and..
"there is only one" backend.

Uh... what?

Yours in ignorance and confusion,
--paulr

Sorry, it is confusing.

We have a target-independent layer in the backend, and then a large number
of targets.

The target-independent layer makes it possible for all of the targets to
share common code where it makes sense. In fact, a large part of
legalization is shared between the targets. However, it is quite complex,
and there have been lots of efforts over the years to simplify
legalization. So even though this is shared, it is something interesting.

However, this target-independent logic is hooked *extensively* by each
target. And documenting exactly how to do all of this is the thing being
referenced. And simplifying this per-target hooking logic (or reducing,
etc) makes writing a new target in the backend significantly easier.

Hope that helps some.

So is it fair to say that “writing a new backend” should be understood as “adding a new target”?

With that interpretation, it does make sense.

–paulr

Yes.