[RFC] Identifying access to errno

Hello,

On some systems (Linux/glibc, for example), some libm math functions (like cos(double)) might set errno. It is important that we model this, in general, to prevent miscompilation (we would not, for example, want to reorder a call to cos in between a call to open and a call to perror). However, almost no code in the wild checks errno after calls to libm math functions, and this errno-setting behavior prevents vectorization and other useful loop optimizations, CSE, etc. Also, currently, the scalar llvm.<libm function> intrinsics are subtly broken on systems where the underlying libm functions may set errno, because the intrinsics are readonly, and may be implemented by calls to the libm function (which might set errno), exposing us to reordering problems (as in the example above).

I think that we could do a better job if we modeled this more explicitly. In theory errno could be anything (because POSIX allows errno to be a macro "of type int"), but in practice, I think the the following rules will do:

1. Assume that all unknown external functions might read/write errno
2. Assume that all i32 pointers might point to errno (although we might be able to do better by somehow leveraging TBAA for "int"?)

Here's a quick overview of how errno is implemented on different systems:

- Traditional POSIX (not thread safe):

   extern int errno;

- DragonFly libc:

   extern __thread int errno;

- Android libc:

   extern volatile int* __errno(void);
   #define errno (*__errno())

- Darwin libc:

   int * __error(void);
   #define errno (*__error())

- musl libc:

   extern int *__errno_location(void);
   #define errno (*__errno_location())

Does anyone see any problems with making stronger (type-based) assumptions re: errno (and, thus, on what things may alias with calls to errno-setting-libm functions)? Also, I don't know if we support LTO-ing the libc implementation, and if that could cause any problems with this? What if, for globals, we insisted that the global be named "errno"?

Thanks again,
Hal

On some systems (Linux/glibc, for example), some libm math functions (like
cos(double)) might set errno. It is important that we model this, in
general, to prevent miscompilation (we would not, for example, want to
reorder a call to cos in between a call to open and a call to perror).
However, almost no code in the wild checks errno after calls to libm math
functions, and this errno-setting behavior prevents vectorization and other
useful loop optimizations, CSE, etc. Also, currently, the scalar llvm.<libm
> intrinsics are subtly broken on systems where the underlying libm
functions may set errno, because the intrinsics are readonly, and may be
implemented by calls to the libm function (which might set errno), exposing
us to reordering problems (as in the example above).

Hi Hal,

I'm confused. On one hand you're proposing us to stop reordering libm calls
because they might set errno (I agree with this), but on the other hand
you're saying that nobody cares and that prevents optimizations (not sure I
agree with this).

1. Assume that all unknown external functions might read/write errno

2. Assume that all i32 pointers might point to errno (although we might be

able to do better by somehow leveraging TBAA for "int"?)

Something like "MayBeErr", "IsErr", "IsntErr".

Does anyone see any problems with making stronger (type-based) assumptions

re: errno (and, thus, on what things may alias with calls to
errno-setting-libm functions)?

I don't, but I'm trying to think of a way to disable it if we know it's
"ok". Maybe -unsafe-math or something similar could disable this pass,
because it is expensive and will impact generated code.

What if, for globals, we insisted that the global be named "errno"?

I wouldn't be surprised if there was a system where the golbal error is not
errno. Windows maybe?

--renato

From: "Renato Golin" <renato.golin@linaro.org>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM" <llvmdev@cs.uiuc.edu>
Sent: Saturday, November 23, 2013 10:53:09 AM
Subject: Re: [LLVMdev] [RFC] Identifying access to errno

On some systems (Linux/glibc, for example), some libm math functions
(like cos(double)) might set errno. It is important that we model
this, in general, to prevent miscompilation (we would not, for
example, want to reorder a call to cos in between a call to open and
a call to perror). However, almost no code in the wild checks errno
after calls to libm math functions, and this errno-setting behavior
prevents vectorization and other useful loop optimizations, CSE,
etc. Also, currently, the scalar llvm.<libm function> intrinsics are
subtly broken on systems where the underlying libm functions may set
errno, because the intrinsics are readonly, and may be implemented
by calls to the libm function (which might set errno), exposing us
to reordering problems (as in the example above).

Hi Hal,

I'm confused. On one hand you're proposing us to stop reordering libm
calls because they might set errno (I agree with this), but on the
other hand you're saying that nobody cares and that prevents
optimizations (not sure I agree with this).

What I'm saying is that very few people actually check errno after libm calls, and so we're often preventing vectorization for no good reason. However, obviously we still need to prove that no errno access is occurring if we want to vectorize (unfortunately, our ability to do this may be limited outside of an LTO context -- but under fast-math or with some pragma, etc. we may be able to change the default assumptions). In short, I'm proposing that we both:

1. Be more strict to prevent unwanted reorderings (by actually modeling that these functions may *write* to errno).

2. Improve our modeling of errno so that we can ignore said writes (safely) when we know that value of errno is unused. A setting like -fno-math-errno should not "remove" the modeling of these writes, just declare our disinterest in the resulting value.

But, what I'm trying to establish here is: how can we recognize possible errno access so that explicitly modeling the writes to errno does not unduly pessimize the surrounding code.

1. Assume that all unknown external functions might read/write errno

2. Assume that all i32 pointers might point to errno (although we
might be able to do better by somehow leveraging TBAA for "int"?)

Something like "MayBeErr", "IsErr", "IsntErr".

On what?

Does anyone see any problems with making stronger (type-based)
assumptions re: errno (and, thus, on what things may alias with
calls to errno-setting-libm functions)?

I don't, but I'm trying to think of a way to disable it if we know
it's "ok". Maybe -unsafe-math or something similar could disable
this pass, because it is expensive and will impact generated code.

We already run IPO/FunctionAttrs, and if we're conservative about escape, it would not add any additional expense. (Ff we do top-down propagation for static functions (or more-generally for LTO), then that could add overhead).

What if, for globals, we insisted that the global be named "errno"?

I wouldn't be surprised if there was a system where the golbal error
is not errno. Windows maybe?

Okay. We should check.

Thanks again,
Hal

> From: "Renato Golin" <renato.golin@linaro.org>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "LLVM" <llvmdev@cs.uiuc.edu>
> Sent: Saturday, November 23, 2013 10:53:09 AM
> Subject: Re: [LLVMdev] [RFC] Identifying access to errno
>
>
>
>
>
>
> On some systems (Linux/glibc, for example), some libm math functions
> (like cos(double)) might set errno. It is important that we model
> this, in general, to prevent miscompilation (we would not, for
> example, want to reorder a call to cos in between a call to open and
> a call to perror). However, almost no code in the wild checks errno
> after calls to libm math functions, and this errno-setting behavior
> prevents vectorization and other useful loop optimizations, CSE,
> etc. Also, currently, the scalar llvm.<libm function> intrinsics are
> subtly broken on systems where the underlying libm functions may set
> errno, because the intrinsics are readonly, and may be implemented
> by calls to the libm function (which might set errno), exposing us
> to reordering problems (as in the example above).
>
>
>
> Hi Hal,
>
>
> I'm confused. On one hand you're proposing us to stop reordering libm
> calls because they might set errno (I agree with this), but on the
> other hand you're saying that nobody cares and that prevents
> optimizations (not sure I agree with this).
>

What I'm saying is that very few people actually check errno after libm
calls, and so we're often preventing vectorization for no good reason.
However, obviously we still need to prove that no errno access is occurring
if we want to vectorize (unfortunately, our ability to do this may be
limited outside of an LTO context -- but under fast-math or with some
pragma, etc. we may be able to change the default assumptions). In short,
I'm proposing that we both:

1. Be more strict to prevent unwanted reorderings (by actually modeling
that these functions may *write* to errno).

2. Improve our modeling of errno so that we can ignore said writes
(safely) when we know that value of errno is unused. A setting like
-fno-math-errno should not "remove" the modeling of these writes, just
declare our disinterest in the resulting value.

But, what I'm trying to establish here is: how can we recognize possible
errno access so that explicitly modeling the writes to errno does not
unduly pessimize the surrounding code.

>
>
>
>
> 1. Assume that all unknown external functions might read/write errno
>
> 2. Assume that all i32 pointers might point to errno (although we
> might be able to do better by somehow leveraging TBAA for "int"?)
>
>
>
> Something like "MayBeErr", "IsErr", "IsntErr".

On what?

>
>
>
>
>
> Does anyone see any problems with making stronger (type-based)
> assumptions re: errno (and, thus, on what things may alias with
> calls to errno-setting-libm functions)?
>
>
> I don't, but I'm trying to think of a way to disable it if we know
> it's "ok". Maybe -unsafe-math or something similar could disable
> this pass, because it is expensive and will impact generated code.
>

We already run IPO/FunctionAttrs, and if we're conservative about escape,
it would not add any additional expense. (Ff we do top-down propagation for
static functions (or more-generally for LTO), then that could add overhead).

>
>
>
>
> What if, for globals, we insisted that the global be named "errno"?
>
>
>
> I wouldn't be surprised if there was a system where the golbal error
> is not errno. Windows maybe?

Okay. We should check.

A Windows program can have two different global error numbers: Win32 and
C-Runtime flavor.

The Win32 flavor is exposed via GetLastError/SetLastError[Ex]
The CRT flavor (errno) is, currently, exposed via a macro that expands to
(*_errno())

Oh, and I forgot a third “_doserrno” for which no amount of documentation lends itself to a consistent description of its behavior.

Ah! It was too simple... :wink:

--renato

From: "David Majnemer" <david.majnemer@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Renato Golin" <renato.golin@linaro.org>, "LLVM" <llvmdev@cs.uiuc.edu>
Sent: Saturday, November 23, 2013 2:37:52 PM
Subject: Re: [LLVMdev] [RFC] Identifying access to errno

Oh, and I forgot a third "_doserrno" for which no amount of
documentation lends itself to a consistent description of its
behavior.

Are all of these things possibly set by cos(double) and friends?

-Hal

> From: "David Majnemer" <david.majnemer@gmail.com>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "Renato Golin" <renato.golin@linaro.org>, "LLVM" <
llvmdev@cs.uiuc.edu>
> Sent: Saturday, November 23, 2013 2:37:52 PM
> Subject: Re: [LLVMdev] [RFC] Identifying access to errno
>
>
> Oh, and I forgot a third "_doserrno" for which no amount of
> documentation lends itself to a consistent description of its
> behavior.

Are all of these things possibly set by cos(double) and friends?

Math functions are special! :slight_smile:

In error cases, they call a function call _matherr. This function will
typically set only errno. However, programs which link dynamically against
the MSVCRT are permitted to replace the function with whatever they want
(this is documented behavior!); their _matherr might forward it's arguments
to "fire_the_missiles".

Sounds like System V's matherr(3):
http://www.freebsd.org/cgi/man.cgi?query=matherr&manpath=SunOS+5.10

Perhaps it's worth noting (though Hal's original message hinted at it) that
post-C99 math functions need not use errno; they are permitted to instead
have a separate set of error flags, which the user can promise to ignore by
way of a standard #pragma.