lldb patches for OpenBSD

Hi,

I have several small fixes for OpenBSD, I am just compiling lldb, not
at all meaning to run it for now. Please give feedback.

1) C99 discourages LONG_LONG_MAX, ULONG_LONG_MAX, and relatives? Can
this be fixed properly by complete replacement or
you would settle for ifndef in the code? I don't know if it exists on
Mac OSX and Linux, so can you please change appropriately? I have
included a ugly ifndef in a DataExtractor.cpp and DNBDataRef.cpp.

2) File.cpp needs <sys/stat.h> because all the permissions like
S_IRUSR, S_IWUSR, S_IROTH, S_IXOTH etc are defined there in OpenBSD
and FreeBSD (both -current or head sources). I don't have any Linux
sources around but Linux knowledgeable people will correct me, if its
needed there, if not we can add ifndef?

3) DT_WHT won't be defined in <sys/dirent.h> on OpenBSD. In fact, it
was specifically removed.

4) return (uint64_t) (pthread_self()); is a poor hack for getting
thread id in Host.cpp. I hope to correct that in with feedback from
more knowledgeable people here on this board. I privately sent a email
to an OpenBSD developer, Matt Dempsky who wrote.

llvm/tools/lldb/source/Host/common/Host.cpp:410: error: cast from
'pthread*' to 'lldb::tid_t' loses precision

"That's because tid_t is a uint32_t, but on 64-bit arches pthread_t is
a pointer, and so it's a 64-bit value. It would probably be better to
change tid_t to a uint64_t. I don't imagine that would be a
controversial change, and it affects all 64-bit non-Apple platforms, I
believe."

Also, is there any interest in convert the Makefiles into CMakeLists.txt?

Thanks,
amit

Index: include/lldb/Host/Config.h

Hi,

I have several small fixes for OpenBSD, I am just compiling lldb, not
at all meaning to run it for now. Please give feedback.

1) C99 discourages LONG_LONG_MAX, ULONG_LONG_MAX, and relatives? Can
this be fixed properly by complete replacement or
you would settle for ifndef in the code? I don't know if it exists on
Mac OSX and Linux, so can you please change appropriately? I have
included a ugly ifndef in a DataExtractor.cpp and DNBDataRef.cpp.

AFAIK LONG_LONG_MAX is a GCC thing, and only available on linux with
certain compiler flags (-D_GNU_SOURCE for example). I would imagine
Clang provides it too for compatibility. I think the solution here is
to just not use it.

I think using ULLONG_MAX and friends makes sense. For platforms that do
not define them we can easily provide our own in a config.h.

2) File.cpp needs <sys/stat.h> because all the permissions like
S_IRUSR, S_IWUSR, S_IROTH, S_IXOTH etc are defined there in OpenBSD
and FreeBSD (both -current or head sources). I don't have any Linux
sources around but Linux knowledgeable people will correct me, if its
needed there, if not we can add ifndef?

Hmm. They are defined there on linux too -- must be getting included
indirectly somehow. I do not see any reason to not just include the
missing header directly in this case; no need to ifdef it AFAICT.

3) DT_WHT won't be defined in <sys/dirent.h> on OpenBSD. In fact, it
was specifically removed.

Interesting. We get the d_type field only when _BSD_SOURCE is defined
on linux. Perhaps we should just be calling lstat(2) here?

4) return (uint64_t) (pthread_self()); is a poor hack for getting
thread id in Host.cpp.

Agreed. But in this case we just have not gotten around to writing the
proper support. For linux we should be calling gettid(2) and have that
factored out into the linux-specific side of the Host implementation.

For OpenBSD perhaps it makes sense to create a new Host subdirectory so
this stuff can get handled as appropriate for the platform? Perhaps one
that can be shared between all {Open,Free,Net}BSD's? I am not too sure
what would make the most sense here.

I hope to correct that in with feedback from
more knowledgeable people here on this board. I privately sent a email
to an OpenBSD developer, Matt Dempsky who wrote.

> llvm/tools/lldb/source/Host/common/Host.cpp:410: error: cast from
> 'pthread*' to 'lldb::tid_t' loses precision

"That's because tid_t is a uint32_t, but on 64-bit arches pthread_t is
a pointer, and so it's a 64-bit value. It would probably be better to
change tid_t to a uint64_t. I don't imagine that would be a
controversial change, and it affects all 64-bit non-Apple platforms, I
believe."

For linux we use 32-bit pids on all platforms I think. But I have no
big worries about growing lldb::tid_t to 64-bits if a platform needs to
stuff a pointer in there. Out of curiosity, how does OpenBSD enumerate
threads?

Also, is there any interest in convert the Makefiles into CMakeLists.txt?

I personally to not know CMake very well. I have used it a few times
but not on a substantial project by any means.

I do not have any flat out objections to using CMake. Eventually we
need a configure-like stage that I was eventually going to get around to
implementing but if that can be done using a new CMake build then that
would be fine with me (I think).

However, the test suite is based on makefiles and I do not have a
hand in that side of things. CMake would need to be a good fit in that
context too.

Also, I think we would need to discuss if CMake can generate suitable XCode
project files since its big advantage is the cross-platform thing. If we
cannot use it as a cross-platform build system then I am not too sure
what advantage is has over good old make.

For linux we use 32-bit pids on all platforms I think.

OpenBSD pid_t's are 32-bit as well.

But I have no
big worries about growing lldb::tid_t to 64-bits if a platform needs to
stuff a pointer in there.

Yes, that's what I had initially recommended to Amit for how to fix
this error. Unless he discovered a reason not to do that, that's what
I'd still recommend.

Out of curiosity, how does OpenBSD enumerate threads?

The currently supported threading library on OpenBSD still uses
userspace threading, and so the pthread_t identifiers are just
pointers to the thread control blocks that get allocated using
malloc(3), and hence might be arbitrary 64-bit pointer values.

There's also a second threading library still undergoing (slow)
development that uses kernel thread scheduling. pthread_t identifiers
here are still pointers to the userspace thread control blocks, but
there's also a newly added system call getthrid() that returns a pid_t
that enumerates the kernel thread id directly. (Obviously this system
call is only useful if the kernel thread library is in use.)

I think using ULLONG_MAX and friends makes sense. For platforms that do
not define them we can easily provide our own in a config.h.

OK, I will resend a diff with LLONG_MAX and ULLONG_MAX only.

2) File.cpp needs <sys/stat.h> because all the permissions like
S_IRUSR, S_IWUSR, S_IROTH, S_IXOTH etc are defined there in OpenBSD
and FreeBSD (both -current or head sources). I don't have any Linux
sources around but Linux knowledgeable people will correct me, if its
needed there, if not we can add ifndef?

Hmm. They are defined there on linux too -- must be getting included
indirectly somehow. I do not see any reason to not just include the
missing header directly in this case; no need to ifdef it AFAICT.

That's great then.

3) DT_WHT won't be defined in <sys/dirent.h> on OpenBSD. In fact, it
was specifically removed.

Interesting. We get the d_type field only when _BSD_SOURCE is defined
on linux. Perhaps we should just be calling lstat(2) here?

Theo's words, I don't know why he didn't elaborate, anyway
http://marc.info/?l=openbsd-tech&m=130085319818542&w=2

lstat() would be good replacement. As it is defined in Open + Free BSD
and Linux.

http://www.freebsd.org/cgi/man.cgi?query=lstat&sektion=2&apropos=0&manpath=FreeBSD+8.2-RELEASE
http://www.openbsd.org/cgi-bin/man.cgi?query=lstat&sektion=2
http://www.linux.com/learn/docs/man/3566-lstat2

Also, is there any interest in convert the Makefiles into CMakeLists.txt?

I personally to not know CMake very well. I have used it a few times
but not on a substantial project by any means.

I do not have any flat out objections to using CMake. Eventually we
need a configure-like stage that I was eventually going to get around to
implementing but if that can be done using a new CMake build then that
would be fine with me (I think).

However, the test suite is based on makefiles and I do not have a
hand in that side of things. CMake would need to be a good fit in that
context too.

Also, I think we would need to discuss if CMake can generate suitable XCode
project files since its big advantage is the cross-platform thing. If we
cannot use it as a cross-platform build system then I am not too sure
what advantage is has over good old make.

CMake is much better than anything to generate XCode project files,
Windows Vistual Studio support, Unix makefiles etc. I have hand
written some CMakeLists.txt files based on llvm/clang CMakeLists.txt
itself, but they will need to be updated by you guys when you make the
commits which add, delete, or rename files. If you follow and update
the CMakeLists.txt religiously, you can get lldb to be compiled on
almost all platforms. Of course the biggest hurdle is the support to
be added inside lldb for BSD's, Windows etc. But with CMake you can
just forget about adding tweaks for all OS to get it to compile. You
can concentrate on development and CMake takes care of compilation.

If you are committed to using CMake, I will send across the files
after polishing it up... They aren't yet done for all the folders yet
but a major chunk is done. I really want to see lldb on OpenBSD, now
that gdb is inaccessible due to GPL v3.

http://www.cmake.org/Wiki/CMake_FAQ#What_is_CMake.3F

Thanks

I just checked in some fixes:

Committed revision 128720.
Committed revision 128721.

Does this take care of all your issues? I will respond about the thread ID one in a separate email.

Greg Clayton

4) return (uint64_t) (pthread_self()); is a poor hack for getting
thread id in Host.cpp. I hope to correct that in with feedback from
more knowledgeable people here on this board. I privately sent a email
to an OpenBSD developer, Matt Dempsky who wrote.

llvm/tools/lldb/source/Host/common/Host.cpp:410: error: cast from
'pthread*' to 'lldb::tid_t' loses precision

"That's because tid_t is a uint32_t, but on 64-bit arches pthread_t is
a pointer, and so it's a 64-bit value. It would probably be better to
change tid_t to a uint64_t. I don't imagine that would be a
controversial change, and it affects all 64-bit non-Apple platforms, I
believe."

Yikes,

It looks like this function is incorrect for non Apple variants:

lldb::tid_t
Host::GetCurrentThreadID()
{
#if defined (__APPLE__)
    return ::mach_thread_self();
#else
    return lldb::tid_t(pthread_self());
#endif
}

A "lldb::tid_t" is a thread ID, not a thread opaque pointer. The "lldb::thread_t" is the already the correct type for the host system's "thread opaque pointer", but that isn't what this function is trying to return.

Dooes linux or BSD have the notion of a thread ID for the current thread that isn't just a pthread_t?

This function is used mostly for logging when you want to log the current process ID and the current thread ID.

Also, is there any interest in convert the Makefiles into CMakeLists.txt?

Is the LLVM build able to use CMakeLists.txt? What are the advantages of using CMakeLists.txt?

At least with OpenBSD's standard userspace threading library right
now, you can't do any better than the (64-bit) pthread_t.

Once we switch to kernel scheduled threads (which everyone's hoping
will happen soon, but has been a slow multiyear project not likely to
be fully ready for a while yet), you can use the OpenBSD-specific
getthrid() system call, which returns a (32-bit) pid_t identifying the
thread within that process.

I imagine Linux and other BSDs with kernel scheduled threads have
similar system calls to getthrid() already, but I'm not familiar with
those details on other OS's. (I'm just an OpenBSD kernel hacker
casually interested in LLVM; I'm not really even an expert on our
threading/scheduling code.)

The LLVM CMake build system is maintained and up to date. This is the one used to compile clang/llvm on Windows for instance (by generating Visual Studio projects), and it can also be used to generate a working llvm/clang Xcode projects (the Xcode projects files included in svn are not up-to date and I don't think it is possible to build something useful using these files).

-- Jean-Daniel

They do great. It will allow the lldb compile to proceed further down
the road :slight_smile:

Also, is there any interest in convert the Makefiles into CMakeLists.txt?

Is the LLVM build able to use CMakeLists.txt? What are the advantages of using CMakeLists.txt?

Yes, LLVM + Clang use CMakeLists.txt ***today*** and the devs over
there religiously maintain them. I see the CMakeLists.txt changing all
the time when I do an update.

CMake has different generators to facilitate building on different
OSs. So it generates Makefiles with the -g Unix Makefiles option or it
can generate Microsoft Visual Studio .sln files, or Apple Xcode or
Codeblocks or Eclipse or whatever you care to write it in.

http://www.cmake.org/Wiki/CMake_Generator_Specific_Information

Here on OpenBSD or Linux, I would use CMake to generate Makefiles and
just type gmake or make, and the compile would be the usual way. You
would use CMake to generate XCode and open it in your IDE... If you
have any bugs you report to those guys and they fix it.

In fact, they invented CMake for just such a cross platform project.
If you have any doubts ask cfe-dev@ for help.

Thanks

A "lldb::tid_t" is a thread ID, not a thread opaque pointer. The
"lldb::thread_t" is the already the correct type for the host system's
"thread opaque pointer", but that isn't what this function is trying
to return.

Dooes linux or BSD have the notion of a thread ID for the current
thread that isn't just a pthread_t?

Linux has gettid(2), so yes.

This function is used mostly for logging when you want to log the
current process ID and the current thread ID.

Just some thoughts:

Platforms that use user-space thread library implementations can provide
some Host code that maps pthread_t pointers (or whatever) via a hash to
useful 32-bit numeric id's. We could do that in Host::ThreadCreate for
example. It really then becomes an issue for the Host to figure out how
to make those ID's useful from a logging/user perspective.

But without a good way to map/log those ID's, doing a simple truncation
on a pointer is probably just fine *most* of the time since those last
bits are going to be the unique ones anyway.

So even though the lldb::tid_t(pthread_self()) is "wrong", it still
does what it needs to do in this case; give a per-thread-unique integer
identifier.

So I think the main question is: does having the full 64 bit pointer as
an ID give useful info to the developers using user-space threads? I
mean, is there some tool (top, ps, etc) or function that one can use to
map those values into something meaningful, or is uniqueness the only
thing that matters?

A "lldb::tid_t" is a thread ID, not a thread opaque pointer. The
"lldb::thread_t" is the already the correct type for the host system's
"thread opaque pointer", but that isn't what this function is trying
to return.

Dooes linux or BSD have the notion of a thread ID for the current
thread that isn't just a pthread_t?

Linux has gettid(2), so yes.

This function is used mostly for logging when you want to log the
current process ID and the current thread ID.

Just some thoughts:

Platforms that use user-space thread library implementations can provide
some Host code that maps pthread_t pointers (or whatever) via a hash to
useful 32-bit numeric id's. We could do that in Host::ThreadCreate for
example. It really then becomes an issue for the Host to figure out how
to make those ID's useful from a logging/user perspective.

We do track some thread info already in LLDB for any thread that is launched through the Host::ThreadCreate() function (thread name for platforms that don't support it). If you have any ideas on extra stuff you would want to track, let me know.

But without a good way to map/log those ID's, doing a simple truncation
on a pointer is probably just fine *most* of the time since those last
bits are going to be the unique ones anyway.

Agreed.

So even though the lldb::tid_t(pthread_self()) is "wrong", it still
does what it needs to do in this case; give a per-thread-unique integer
identifier.

Yep. In our case we have functions that can use the thread ID to do thread type calls in the debugger, so we like to show the IDs that could be used to make calls if possible. Also if there are crash loggers or sample tools that show a list of threads, it is nice to be able to see the same IDs in all places.

So I think the main question is: does having the full 64 bit pointer as
an ID give useful info to the developers using user-space threads? I
mean, is there some tool (top, ps, etc) or function that one can use to
map those values into something meaningful, or is uniqueness the only
thing that matters?

We currently don't use the thread IDs for the host threads in the current process for anything. If we do, we would use the thread handle which is already a pthread_t for all of us unix based OS's so we could add a:

static lldb::thread_t Host::GetCurrentThread();

call for those situations.

So this is currently functional and there is no immediate need for change unless there are other debug related tools where you would want to correlate the threads IDs with.

Greg Clayton

Greg Clayton

Is there a reason not to just make the thread ID a 64-bit value and be
done with it? Seems like doing anything else is over-engineering the
problem.

That would be fine. We will need to track down any printf statements (or similar Printf stream calls) that use thread IDs and make sure to specify:

"0x%4.4llx"

instead of our current default of:

"0x%4.4x"

gdb has some functions like "paddr_nz" which take the basic address types and converts them to whatever string looks good for that type. Might be nice to do something like that instead of hard-coding format strings in logs everywhere and then having to go change them when we end up needing to change the type...

Jim

I was not trying to bike-shed. Just understand the problem.

If there is "good information" in the upper 32-bits then we should
display it in our logs to save developer/human time when debugging.

On linux we currently have a ProcessMessage class (might replace it in
time, but no matter). I actually do care about its size (it has a tid_t
in it) since it is passed around by value. When processing internal
events we could be generating thousands of these guys per second so
efficiency does matter -- not by much, but perhaps a little.

So these types of things are worth thinking about.