Hi Tom,
We want the following patches go into release 3.4.1.
Clang:
196750 [AArch64]Add missing pair intrinsics such as: int32_t vminv_s32(int32x2_t a)
196834 [AArch64] Remove q and non-q intrinsic definitions from the NEON scalar reduce pairwise implementation, using an overloaded definition instead.
196835 [AArch64] Refactor the NEON scalar reduce pairwise front-end codegen to remove unnecessary patterns in tablegen.
196836 [AArch64] Refactor the NEON scalar reduce pairwise intrinsics so that they use float/double rather than the vector equivalents when appropriate.
196888 [AArch64 NEON] Support poly128_t and implement relevant intrinsic.
196927 [AArch64] Refactor the Neon vector/scalar floating-point convert implementation. Specifically, reuse the ARM intrinsics when possible.
196931 [AArch64] Refactor the Neon vector/scalar floating-point convert intrinsics so that they use float/double rather than the vector equivalents when appropriate.
196936 [AArch64] Refactor the redundant code in the EmitAArch64ScalarBuiltinExpr() function. No functional change intended.
196966 [AArch64] Overload NEON signed/unsigned integer convert to floating-point LLVM AArch64 intrinsics.
196967 [AArch64] Overload NEON signed/unsigned floating-point convert to fixed-point and fixed-point convert to floating-point LLVM AArch64 intrinsics.
196968 [AArch64] Refactor the NEON signed/unsigned floating-point convert to fixed-point LLVM AArch64 intrinsics to use f32/f64, rather than their vector equivalents.
196969 [AArch64] Refactor the NEON floating-point absolute difference LLVM AArch64 intrinsic to use f32/f64 types, rather than their vector equivalents.
197069 [AArch64] Refactor the NEON scalar floating-point reciprocal estimate, floating-point reciprocal exponent, and floating-point reciprocal square root estimate
197070 [AArch64] Refactor the NEON scalar floating-point reciprocal step and floating-point reciprocal square root step LLVM AArch64 intrinsics to
197071 [AArch64] Add NEON scalar floating-point compare LLVM AArch64 intrinsics that use f32/f64 types, rather than their vector equivalents.
197091 [AArch64] Refactor NEON floating-point Max/Min/Maxnm/Minnm across vector AArch64 intrinsics to use f32 types, rather than their vector equivalents.
197112 [AArch64] Fix Incorrect CHECK message [0-31]+ in test case.
197403 [AArch64] Fix v1fx patterns for Floating-point Multiply Extend and Floating-point Compare to Zero.
197898 [AArch64] The compare to zero intrinsics should be implemented by ‘icmp/fcmp’ and ‘sext’ not ‘zext’. Modify the implementation by replacing zext with sext.
197994 [AArch64] Add some missing test cases for ACLE intrinsics of AArch64 NEON.
198195 [AArch64] For AArch64 Neon, simplify scalar dup by lane0 for fp.
198741 [AArch64] For AArch64, support builtin neon vector type with ‘long’ as base element type.
199866 [AArch64 NEON] Fix a bug about vcles_f32 and vcled_f64.
200114 [AArch64] For AArch64 Neon, fix intrinsics implementation using nested macros.
200470 ARM & AArch64: share the BI__builtin_neon enum defs.
200471 ARM & AArch64: fully share NEON implementation of permutation intrinsics
200472 ARM & AArch64: extend shared NEON implementation to first block.
200524 ARM & AArch64: merge another NEON block completely.
200525 ARM & AArch64: more instructions into common block
200526 ARM & AArch64: move shared vld/vst intrinsics to common implementation.
200527 ARM & AArch64: another block of miscellaneous NEON sharing.
200528 ARM & AArch64: unify the rest of the completely shared NEON implementations
200707 [AArch64] AArch64: use new non-polymorphic crypto intrinsics This was caused by r200708 which enabled the crypto feature for these cores.
200708 ARM: implement support for crypto intrinsics in arm_neon.h
200769 ARM & AArch64: combine implementation of vcaXYZ intrinsics
201112 [AArch64] Fixed vget/vset_lane_f16 implementation
201384 [AArch64] Enable AArch64 NEON by default.
202004 [AArch64] Change int64_t from ‘long long int’ to ‘long int’ for AArch64 target.
LLVM:
196748 [AArch64]Pattern match failures for truncate store and extend load
196749 [AArch64]Add missing pair intrinsics such as: int32_t vminv_s32(int32x2_t a)
196831 [AArch64] Remove q and non-q intrinsic definitions in the NEON scalar reduce pairwise implementation, using an overloaded definition instead.
196832 [AArch64] Refactor NEON scalar reduce pairwise front-end codegen to remove unnecessary patterns in tablegen.
196833 [AArch64] Refactor the NEON scalar reduce pairwise intrinsics, so that they use float/double rather than the vector equivalents when appropriate.
196887 [AArch64 NEON] Support poly128_t and implement relevant intrinsic.
196889 [AArch64 NEON] Replace fpimm with fpz32 for floating compare with zero. This is a small change to be strict. Just want get pattern safer.
196926 [AArch64] Refactor the Neon vector/scalar floating-point convert implementation. Specifically, reuse the ARM intrinsics when possible.
196930 [AArch64] Refactor the Neon vector/scalar floating-point convert intrinsics so that they use float/double rather than the vector equivalents when appropriate.
196962 [AArch64] Overload NEON signed/unsigned integer convert to floating-point LLVM AArch64 intrinsics.
196963 [AArch64] Overload NEON signed/unsigned floating-point convert to fixed-point and fixed-point convert to floating-point LLVM AArch64 intrinsics.
196964 [AArch64] Refactor the NEON signed/unsigned floating-point convert to fixed-point LLVM AArch64 intrinsics to use f32/f64, rather than their vector equivalents.
196965 [AArch64] Refactor the NEON floating-point absolute difference LLVM AArch64 intrinsic to use f32/f64 types, rather than their vector equivalents.
196998 [AArch64 NEON] Get instruction BSL matched to VSELECT.
197066 [AArch64] Refactor the NEON scalar floating-point reciprocal estimate, floating- point reciprocal exponent, and floating-point reciprocal square root estimate
197067 [AArch64] Refactor the NEON scalar floating-point reciprocal step and floating-point reciprocal square root step LLVM AArch64 intrinsics to
197068 [AArch64] Add NEON scalar floating-point compare LLVM AArch64 intrinsics that use f32/f64 types, rather than their vector equivalents.
197090 [AArch64] Refactor NEON floating-point Max/Min/Maxnm/Minnm across vector AArch64 intrinsics to use f32 types, rather than their vector equivalents.
197113 Fix Incorrect CHECK message [0-31]+ in test case. In regular expression, [0-31]+ equals to [0-3]+, not the number from
197135 [AArch64]Fix the problem that AArch64 backend fails to select scalar_to_vector of vector types having more than one element.
197159 [AArch64] Removed unnecessary copy patterns with v1fx types.
197250 [AArch64] Simplify the Neon Scalar3Same patterns for floating-point reciprocal step, floating-point reciprocal square root step, floating-point absolute
197361 [AArch64]Fix the pattern match failure for v1i8/v1i16/v1i32 types. Currently we have such types as legal vector types. The DAG combiner may generate some DAG nodes having such types but we don’t have patterns to match them.
197402 [AArch64] Fix v1fx patterns for Floating-point Multiply Extend and Floating-point Compare to Zero.
197551 [AArch64 NEON]Implment loading vector constant form constant pool.
197897 [AArch64]The compare to zero intrinsics should be implemented by ‘icmp/fcmp’ and ‘sext’ not ‘zext’. Modify the test cases.
197928 [AArch64 NEON] Fixed fused multiply negate add/sub patterns
197929 [AArch64] Check fmul node single use in fused multiply patterns
197966 [AArch64 NEON] Fix a pattern match failure with NEON_VDUP.
197967 [AArch64 NEON] Fix a bug when lowering BUILD_VECTOR.
197969 [AArch64]Add patterns to match normal shift nodes: shl, sra and srl.
197993 Add missing pattern matches to support ACLE intrinsics of AArch64 NEON.
198001 [AArch64]Fix a problem that the register order of fmls/fmla by element is incorrect.
198084 Teach DAGCombiner how to fold a SIGN_EXTEND_INREG of a BUILD_VECTOR of ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR.
198188 [AArch64]Fix the problem that can’t select mul of v1i64/v2i64 types.
198190 Fix a bug in DAGcombiner about zero-extend after setcc.
198192 [AArch64]Can’t select shift left 0 of type v1i64
198193 [AArch64]Add code to spill/fill Q register tuples such as QPair/QTriple/QQuad.
198194 For AArch64 Neon, simplify scalar dup by lane0 for fp.
198437 [AArch64][NEON] Added SXTL and SXTL2 instruction aliases
198675 [AArch64 NEON] Fixed incorrect immediate used in BIC instruction.
198682 [AArch64]Add support to copy D tuples such as DPair/DTriple/DQuad and Q tuples such as QPair/QTriple/QQuad. There is no test case for D tuple as the original test cases are too large. As the copy of the D tuple is similar to the Q tuple, the correctness can be guaranteed.
198684 [AArch64]Add support to spill/fill D tuples such as DPair/DTriple/DQuad. There is no test cases for D tuple as the original test cases are too large. As the spill/fill of the D tuple is similar to the Q tuple, the correctness can be guaranteed.
198730 Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine.
198743 [AArch64 NEON] Fix generating incorrect value type of NEON_VDUPLANE when lower build_vector if result value type mismatch with operand
198791 [AArch64][NEON] Added UXTL and UXTL2 instruction aliases
198937 Make sure -use-init-array has intended effect on all AArch64 ELF targets, not just linux.
198941 Silence unused variable warning for non-asserting builds that was introduced in r198937.
199069 [AArch64 NEON] Add more scenarios to use perm instructions when lowering shuffle_vector
199070 [AArch64 NEON] Add missing patterns for bitcast from or to v1f64
199242 [AArch64] Added vselect patterns with float and double types
199296 For AArch64, lowering sext_inreg and generate optimized code by using SXTL.
199369 For ARM, fix assertuib failures for some ld/st 3/4 instruction with wirteback.
199461 [AArch64]Fix the problem can’t select concat_vectors of two v1i32 types. Also fix the problem can’t select scalar_to_vector from f32 to v2f32/v4f32.
199462 [AArch64 NEON] Custom lower conversion between vector integer and vector floating point if element bit-width doesn’t match.
199463 [AArch64]Fix the problem can’t select f16_to_f32 and f32_to_f16. Also add copy support for FPR16.
199485 [AArch64 NEON] Expand vector for UDIV/SDIV/UREM/SREM/FREM as neon doesn’t support these operations.
199621 [AArch64 NEON] Accept both #0.0 and #0 for comparing with floating point zero in asm parser.
199628 [AArch64 NEON] Fix a bug caused by undef lane when generating VEXT.
199631 Revert r199628: “[AArch64 NEON] Fix a bug caused by undef lane when generating VEXT.”
199791 [AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or SHUFFLE_VECTOR.
199858 fix some spell mistakes around ‘ConcatVector’ and ‘ShuffleVector’ in AArch64 backend.
199861 [AArch64]Add CHECK for two test cases testing scalar_to_vector committed in r199461.
199978 [AArch64 NEON] Fix a bug in implementing register copy bwtween FPR16.
200109 [AArch64 NEON] Fix pattern match failed on FP_ROUND from v1f128 to v1f64.
200110 [AArch64 NEON] Add test case for vector FP_ROUND.
200111 [AArch64 NEON] Add patterns for concat_vector on v2i32.
200113 Implement pattern match from v1xx to v1xx for AArch64 Neon.
200119 Improve pattern match from v1i8 to v1i32 for AArch64 Neon.
200179 Revert r199791.
200180 [AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or SHUFFLE_VECTOR.
200365 [AArch64 NEON] Lower SELECT_CC with vector operand.
200491 [AArch64] Custom lower concat_vector patterns with v4i16, v4i32, v8i8, v8i16, v16i8 types.
200706 AArch64 & ARM: refactor crypto intrinsics to take scalars
200768 ARM & AArch64: merge NEON absolute compare intrinsics
201061 [AArch64]Implement the copy of two FPR8 registers by using FMOVss of two FPR32 registers in copyPhysReg.
201091 [AArch64] Handle aliases of conditional branches without b.pred form.
201287 [AArch64]Add support for spilling FPR8/FPR16.
201298 [AArch64]Fix the problems that can’t select mul/add/sub of v1i8/v1i16/v1i32 types. As this problems are similar to shl/sra/srl, also add patterns for shift nodes.
201381 [AArch64]Fix the assertion failure caused by “v1i1 SETCC” DAG node. As v1i1 is illegal, the type legalizer tries to scalarize such node. But if the type operands of SETCC is legal, the scalarization algorithm will cause an assertion failure.
201385 Enable AArch64 NEON by default.
201395 [AArch64 NEON] Fix a bug to avoid using floating type as condition type in lowering SELECT_CC.
201541 Fix a typo about lowering AArch64 va_copy.
201793 [AArch64] Add support for TargetTransformInfo Analysis.
201841 [AArch64] Add register constraints to avoid generating STLXR and STXR with unpredictable behavior.
202775 [AArch64]Fix improper diagnostics about offset range of load/store instructions.
204304 [ARM]Fix an assertion failure in A15SDOptimizer about DPair reg class by treating DPair as QPR.
204424 [AArch64] Remove .data_region directive from AArch64.
I know the patch list is little bit longer, we have the following reasons,
- Last year, when branch 3.4 was created, actually we didn’t really have time to complete all AArch64 neon work. So branch 3.4 is actually at the middle stage of aarch64 neon implementation. Now the patches I’m requesting intends to give a complete AArch64 NEON feature.
- There are several critical bug fixes solving compiler crash issue, and our end-user really want them to be fixed in new release, and end-user can’t wait until release 3.5.
- A lot of patches are interleaved and have dependence one another, so it’s easy to introduce bug if do cherry picking only for some of them.
The patches I listed are in time ordering, so it’s easy for you to apply them to branch 3.4. There are only the following failures, but it’s easy to be fixed,
- 200708: Only need to manually add line “Features[“crypto”] = true;” after line 5924 of file lib/Basic/Targets.cpp
- 201384: Manually add two lines below after line 7135 of file lib/Driver/Tools.cpp
else
Features.push_back(“+neon”);
3) 202004: Insert into line 3353 of file test/Preprocessor/init.c. Remove the part around AARCH64-NETBSD, and remove line below as well,
// AARCH64:#define ALIGNOF_MAX_ALIGN_T 16
Please simply remove file “CodeGen/aarch64-neon-crypto.c”, because it is renamed to be CodeGen/neon-crypto.c. I also attached two monolithic patches for your reference.
To minimize your effort, I already did initial test.
The tests I did cover the followings, and all can pass.
- LLVM regression test. “make check-all”
- ARM internal emperor random test
- Spec2000 test.
Finally, those patches could bring the followings to release 3.4.1,
- Complete AArch64 NEON feature:
- support all intrinsics as required by ACLE2.0, and enable AArch64 NEON as default.
- fixed all pattern match issues for AArch64 NEON back-end.
- Bug fixes:
-
Change 64-bit integer type int64_t mapping from “long long” to “long” and it potentially affects binary compatibility.
-
va_copy run-time behavior failure for AArch64.
-
Fix a silent codegen fault for atomic operations (e.g. _sync… Intrinsics).
-
Fix an assertion failure in A15 SDOptimizer about DPair reg class by treating DPair as QPR.
-
Fix ARM back-end ld/st for v1i64 vector list failure around writeback mode.
Let me know if you want more info, please! Appreciate your kindly help in advance!
Thanks,
-Jiangning
release_3_4_1_llvm.patch.tgz (92.7 KB)
release_3_4_1_clang.patch.tgz (51.8 KB)