Implement implicit TLS on Windows - need advice

Hi!

LLVM currently does not implement the implicit TLS model on Windows. This model is easy:

- a thread local variable ends up in the .tls section
- to access a thread local variable, you have to do
   (1) load pointer to thread local storage from TEB
       On x86_64, this is gs:0x58, on x86 it is fs:0x2C.
   (2) load pointer to thread local state. In general, the index is stored in variable _tls_index. For a .exe, _tls_index is always 0.
   (3) load offset of variable to start of .tls section.
   (4) the thread local variable can now accessed with the add of step 2 and 3.

For x86_64, something like the following should be generated for the tls1.ll test case:

(1) mov rdx, qword [gs:abs 58H]
(2) mov ecx, dword [rel _tls_index]
         mov rcx, qword [rdx+rcx*8]
(3) mov eax, .tls$:i
(4) mov eax, dword [rax+rcx]
         ret

(See the PECOFF spec, chapter 5.7 and Thread Local Storage, part 4: Accessing __declspec(thread) data « Nynaeve for reference.)

I tried to implement this. With the attached patch tls1.patch, a thread local variable ends up in the .tls section. This looks fine.
With the second patch tls2.patch I try to implement the code sequence. Here I have a couple of questions:

- To get the offset to start of .tls section, I have created a new MachineOperand flag. Is this the right approach? If yes then I need a hint where to implement this in the WinCOFFObjectWriter.
- How can I code the load of variable _tls_index in SelectionDAG? I have some trouble using DAG.getExternalSymbol and then loading the value.

Thanks for your help.

Kai

tls2.patch (4.05 KB)

tls1.patch (2.74 KB)

Thanks for working on this!

The first patch looks fine except that it's emitting to .tls when it
should be .tls$. Also, you need to add tests.

As for the second patch, that's not how MSVC 2010 emits code (and it
needs tests).

thread_local.c:

#ifdef _MSC_VER
#define __thread __declspec(thread)
#endif
__thread int i = 0;

int foo() {
  return i++;
}

thread_local.asm:

PUBLIC _i
_TLS SEGMENT
_i DD 00H
_TLS ENDS
PUBLIC _foo
EXTRN __tls_array:DWORD
EXTRN __tls_index:DWORD
; Function compile flags: /Ogtpy
_TEXT SEGMENT
_foo PROC
; File c:\users\mspencer\projects\llvm-project\test\thread_local.c
; Line 7
  mov eax, DWORD PTR __tls_index
  mov ecx, DWORD PTR fs:__tls_array
  mov ecx, DWORD PTR [ecx+eax*4]
  mov eax, DWORD PTR _i[ecx]
  lea edx, DWORD PTR [eax+1]
  mov DWORD PTR _i[ecx], edx
; Line 8
  ret 0
_foo ENDP
_TEXT ENDS
END

llvm-objdump -d -r -s thread_local.obj:

Disassembly of section .text:
_foo:
       0: a1 00 00 00 00 movl 0, %eax
                               1: IMAGE_REL_I386_DIR32 __tls_index
       5: 64 8b 0d 00 00 00 00 movl
%fs:0, %ecx
                               8: IMAGE_REL_I386_DIR32 __tls_array
       c: 8b 0c 81 movl
(%ecx,%eax,4), %ecx
       f: 8b 81 00 00 00 00 movl
(%ecx), %eax
                              11: IMAGE_REL_I386_SECREL _i
      15: 8d 50 01 leal
1(%eax), %edx
      18: 89 91 00 00 00 00 movl
%edx, (%ecx)
                              1a: IMAGE_REL_I386_SECREL _i
      1e: c3 ret

Contents of section .tls$:
0000 00000000 ....
Contents of section .text:
0000 a1000000 00648b0d 00000000 8b0c818b .....d..........
0010 81000000 008d5001 89910000 0000c3 ......P........

- Michael Spencer

Hi Michael!

Thanks for your answer.

I got a step further - I can generate some code which looks not too bad. And yes - I am aware of the fact that test cases are still missing.

Thanks for pointing out that the 32bit code is a bit different from the 64bit code. I have a real use case for the 64bit code, so this is my first target. I added an assert for the 32bit case. I also changed the name of the section to .tls$.

I still have some questions:

1) In WinCOFFObjectWriter::RecordRelocation I check for the new MCSymbolRefExpr::VK_SECREL. Is this the right approach or should I better create a new fixup kind?

2) Is there a way to lower the code so that an expression like rax+8*rbx is generated by default?

Thank you!

Kai

tls.diff (8.86 KB)

Hi Michael!

Thanks for your answer.

I got a step further - I can generate some code which looks not too bad. And yes - I am aware of the fact that test cases are still missing.

Thanks for pointing out that the 32bit code is a bit different from the 64bit code. I have a real use case for the 64bit code, so this is my first target. I added an assert for the 32bit case. I also changed the name of the section to .tls$.

I still have some questions:

1) In WinCOFFObjectWriter::RecordRelocation I check for the new MCSymbolRefExpr::VK_SECREL. Is this the right approach or should I better create a new fixup kind?

2) Is there a way to lower the code so that an expression like rax+8*rbx is generated by default?

Thank you!

Kai

tls.diff (8.86 KB)