LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all,

LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community.

If you are interested in this position, please send your resume / CV and relevant information to evan.cheng@apple.com

Thanks,

Evan

Job description

The Apple compiler team is seeking an engineer who is strongly motivated to build high-quality and high performance compilers. We are focused on improving the user experience by reducing compile time as well as maximizing the execution speed of the code generated for the Apple systems. As a key member of the Apple Compiler Team, you will apply your strong state-of-the-art background and experience toward the development of fast highly optimized compiler products that extract top performance from the Apple systems.

You will join a small team of highly motivated senior engineers who build first-class open-source compiler tools and apply them in new and innovative ways.

Required Experience:

  • Ideal candidate will have experience with the LLVM, GCC, or other open source / commercial compilers.
  • Strong background in compiler architecture, optimization, code generation and overall design of compilers.
  • Knowledge and experience with developing compilers for embedded devices is a plus.
  • Familiarity with analyzing generated code for optimization/code generation opportunities.
  • Strong communication and teamwork skills.

I have a code generation question for ARM with VFP and NEON.

I am generating code for the following function as a test:

void FloatingPointTest(float f1, float f2, float f3)
{
     float f4 = f1 * f2;
     if (f4 > f3)
          printf("%f\n",f2);
     else
          printf("%f\n",f3);
}

I have tried compiling with:

  1. -mfloat-abi=softfp and -mfpu=neon
  2. -mfloat-abi=hard and -mfpu=neon
  3. -mfloat-abi=softfp and -mfpu=vfp3
  4. -mfloat-abi=hard and -mfpu=vfp3

When I use --emit-llvm -c flags to generate bitcode, and then use llc to
generate ARM assembler, I have tried supplying these flag variations to
llc:

      5. llc -mattr=+neon
      6. llc -mattr=+vfp3

I am building for armv7-a.

In all cases, I get code that looks pretty very the same; its like what
is below. However, I am expecting to see instruction level differences
between the vfp3 and neon versions. When I do the same with gcc 4.2 I do
see differences in the generated code.

Am I mistaken in expecting to see a difference in NEON and VFP
instructions, is this my mistake, or is there something else going on
here?

thanks,
-David

        .private_extern _FloatingPointTest
        .globl _FloatingPointTest
        .align 2
_FloatingPointTest: @ @FloatingPointTest
@ BB#0: @ %entry
        sub sp, sp, #8
        str lr, [sp, #4]
        str r7, [sp]
        mov r7, sp
        sub sp, sp, #36
        str r0, [r7, #-4]
        vmov s0, r0
        str r1, [r7, #-8]
        vmov s1, r1
        str r2, [r7, #-12]
        vmov s2, r2
        vldr.32 s3, [r7, #-4]
        vldr.32 s4, [r7, #-8]
        vmul.f32 s3, s3, s4
        vstr.32 s3, [r7, #-16]
        vldr.32 s4, [r7, #-12]
        vcmpe.f32 s3, s4
        vmrs apsr_nzcv, fpscr
        vstr.32 s0, [sp, #16]
        vstr.32 s2, [sp, #12]
        vstr.32 s1, [sp, #8]
        ble LBB20_2
@ BB#1: @ %bb
        vldr.32 s0, [r7, #-16]
        ldr r0, LCPI20_0

LPC20_0:
        add r0, pc, r0
        vcvt.f64.f32 d1, s0
        vmov r1, r2, d1
        bl _printf
        str r0, [sp, #4]
        b LBB20_3
LBB20_2: @ %bb1
        vldr.32 s0, [r7, #-12]
        ldr r0, LCPI20_1

LPC20_1:
        add r0, pc, r0
        vcvt.f64.f32 d1, s0
        vmov r1, r2, d1
        bl _printf
        str r0, [sp]
LBB20_3: @ %bb2
@ BB#4: @ %return
        mov sp, r7
        ldr r7, [sp]
        ldr lr, [sp, #4]
        add sp, sp, #8
        bx lr
@ BB#5:
        .align 2
LCPI20_0:
        .long L_.str107-(LPC20_0+8)

        .align 2
LCPI20_1:
        .long L_.str107-(LPC20_1+8)

Hi David,

You could see different instructions (as gcc does, you say), but it's
not necessary.

Your example has only floating point arithmetic, which both VFP3 and
NEON can do, so the final assembly will be similar. If you start using
integer arithmetic, than you can see vector instructions for NEON (if
it's vectorized) and not for VFP3.

All chips (to date) with NEON have VFP3, so it's safe to assume that a
-mfpu=neon will have VFP3, so all the decisions about code generated
for VFP3 can safely be assumed by targets with NEON.

Hope that answers your questions.

cheers,
--renato

Thanks, that helps a lot.

All chips (to date) with NEON have VFP3, so it's safe to assume that a

-mfpu=neon will have VFP3, so all the decisions

about code generated for VFP3 can safely be assumed by targets with

NEON.

Just to confirm my understanding, can I correctly say in general that
the llc code generator might blur distinctions between NEON and VFP3
when it can do so safely?

-David

Thanks, that helps a lot.

All chips (to date) with NEON have VFP3, so it’s safe to assume that a

-mfpu=neon will have VFP3, so all the decisions

about code generated for VFP3 can safely be assumed by targets with

NEON.

Just to confirm my understanding, can I correctly say in general that
the llc code generator might blur distinctions between NEON and VFP3
when it can do so safely?

Not exactly. The distinction is clear, it’s just not expressed as an either/or question. Specifically, the code generator considers NEON to be a proper superset of VFP3. So if it has only VFP3, that’s all it will use. If it has NEON, it assumes it also has VFP3 and can use either. There’s not, currently, a way to say “use only NEON instructions; don’t generate any VFP3.”

-Jim

Not exactly. The distinction is clear, it's just not expressed as an
either/or question. Specifically, the code generator considers NEON to be a
proper superset of VFP3. So if it has only VFP3, that's all it will use. If
it has NEON, it assumes it also has VFP3 and can use either.

Indeed.

There's not,
currently, a way to say "use only NEON instructions; don't generate any
VFP3."

Which would be advantageous on some cases, where NEON instructions are
faster than VFP3.

But the way it's done today in LLVM is correct. The output doesn't
have to be different between NEON and VFP3 for VFP3 operations, but it
can be. GCC has some of that knowledge and it's just a matter of time
for LLVM to catch up. :wink:

cheers,
--renato

Just out of curiosity: do we output vfpv3-d16 or -d32 (e.g. d16= tegra2, dove)
? Or doesn't it apply for LLVM (currently) ?

Best,
Jan-Simon