Apparent optimizer bug on X86_64

Compiling a simple automaton created by GNU bison with -O1 or -O2
resulted in the following machine code:

1300 /*-----------------------------.
1301 | yyreduce -- Do a reduction. |
1302 `-----------------------------*/
1303 yyreduce:
1304 /* yyn is the number of a rule to reduce with. */
1305 yylen = yyr2[yyn];
   0x0000000000400c14 <rpcalc_parse+628>: mov r15d,r14d
   0x0000000000400c17 <rpcalc_parse+631>: movzx r12d,BYTE PTR
[r15+0x4015e2]
   0x0000000000400c1f <rpcalc_parse+639>: mov eax,0x1
   0x0000000000400c24 <rpcalc_parse+644>: mov r13,rax
   0x0000000000400c27 <rpcalc_parse+647>: sub r13,r12
   0x0000000000400c2a <rpcalc_parse+650>: mov eax,r13d //
assignment to zero-extends into rax

1306
1307 /* If YYLEN is nonzero, implement the default value of the action:
1308 `$$ = $1'.
1309
1310 Otherwise, the following line sets YYVAL to garbage.
1311 This behavior is undocumented and Bison
1312 users should not rely upon it. Assigning to YYVAL
1313 unconditionally makes the parser a bit smaller, and it avoids a
1314 GCC warning that YYVAL may be used uninitialized. */
1315 yyval = yyvsp[1-yylen];
=> 0x0000000000400c2d <rpcalc_parse+653>: movsd xmm0,QWORD PTR
[rbx+rax*8]
   0x0000000000400c32 <rpcalc_parse+658>: movsd QWORD PTR
[rbp-0x808],xmm0

As far as I understand it, assigning to eax zero-extends to rax.
However, eax holds the result of "1-yylen" which is expected to be
negative, so it should be sign-extended before using its value as rax.
Indexing "in the wrong direction" causes a segfault at the instruction
indicated by '=>'

Here's the disassembly from -O0, which does a sign extension (movsxd):

1300 /*-----------------------------.
1301 | yyreduce -- Do a reduction. |
1302 `-----------------------------*/
1303 yyreduce:
1304 /* yyn is the number of a rule to reduce with. */
1305 yylen = yyr2[yyn];
   0x0000000000401069 <+1945>: movsxd rax,DWORD PTR [rbp-0x80c]
   0x0000000000401070 <+1952>: movzx ecx,BYTE PTR [rax*1+0x401f0f]
   0x0000000000401078 <+1960>: mov DWORD PTR [rbp-0x824],ecx
   0x000000000040107e <+1966>: mov ecx,0x1

1306
1307 /* If YYLEN is nonzero, implement the default value of the action:
1308 `$$ = $1'.
1309
1310 Otherwise, the following line sets YYVAL to garbage.
1311 This behavior is undocumented and Bison
1312 users should not rely upon it. Assigning to YYVAL
1313 unconditionally makes the parser a bit smaller, and it avoids a
1314 GCC warning that YYVAL may be used uninitialized. */
1315 yyval = yyvsp[1-yylen];
   0x0000000000401083 <+1971>: sub ecx,DWORD PTR [rbp-0x824]
   0x0000000000401089 <+1977>: movsxd rax,ecx
   0x000000000040108c <+1980>: mov rdx,QWORD PTR [rbp-0x800]
   0x0000000000401093 <+1987>: movsd xmm0,QWORD PTR [rdx+rax*8]
   0x0000000000401098 <+1992>: movsd QWORD PTR [rbp-0x820],xmm0

yylen is of type YYSIZE_T, which is a macro that expands to size_t or
'unsigned int'. Perhaps clang/LLVM considers "1-yylen" to be unsigned?
Am I completely off-base?

This is
clang version 3.0 (trunk 127463)
Target: x86_64-unknown-linux-gnu
Thread model: posix

Csaba

Please file a bug in Bugzilla, attach the complete preprocessed
source, and show the full steps required to reproduce the issue. You
might be right, but it's hard to tell without context.

-Eli

Created http://llvm.org/bugs/show_bug.cgi?id=9512

Csaba