Vectorization with fast-math on irregular ISA sub-sets

Folks,

I'm now looking at https://llvm.org/bugs/show_bug.cgi?id=16274, which
seems to have some support in the vectorizer, but not as we need for
this particular case. I may have missed something obvious, please let
me know if there is a better way.

As you already know, ARM has two FP instruction sets: VFP and NEON.
VFP applies to single FP registers while NEON is a full SIMD. The
problem is that NEON is not IEEE compliant on FP operations, while VFP
is.

Even if the target has NEON and the user has asked for it to be used,
without -ffast-math and related arguments, we simply can't produce
NEON instructions for FP operations. Different operations may have
different non-compliance (inf, denormals, etc) and I haven't yet
investigated the full support, but it's safe to start from blocking
*all* FP operations on NEON when *any* FP restrictions are in place.
We can expand for better support later, when the infrastructure is in
place.

As far as I could see, ffast-math is included in the vectorizer, but
as an all-or-nothing, which is not what we want to do. So, I thought
about two ways we could go about doing this:

1. The pragmatic way

Add a cost "TCC_Impossible = AllOnes" to TCC and on ARM's cost model,
check if fast-math is checked on FP ALU operations and return that if
false. So, VFP costs would be less than NEON costs divided by their
widths.

This would make any vectorization beyond VFP instructions impossible
is fast-math is not chosen, while still using VFP instructions in the
loop, making it slightly faster.

I'm sceptical to introducing the TCC_Impossible cost, as it seems a
dirty trick. I'm open to other better solutions.

2. The thorough way

Add a flag on TableGen on vector instructions meaning IEEE compliance
for the different levels of support. Add a "fall-back" VFP instruction
to each of them (either in TableGen or TTI).

In the vectorizer, on FP ALU cost, add a check on fast-math && IEEE
conformance. If failed, check the fall-back instruction's width and
add the cost as that * Width/FallBackWidth.

In the back-end, when emitting vector instructions, add the same check
and emit (unroll) the NEON instructions into similar VFP ones, by
checking it's fall-back instruction.

This approach has the benefit of validating IEEE compliance at the
instruction level, thus working for any other "vectorizer" out there,
including out-of-tree ones (though this benefit is very limited).

But it also can change code that it shouldn't, like inline asm or
intrinsics. I have no solution to this particular problem.

Any thoughts?

cheers,
--renato

Hi Renato,

I think it’s important to distinguish between the loop vectorizer and already-existing vector IR + the SLP vectorizer here.

The loop vectorizer does indeed require -ffast-math, but the IEEE-nonconformant transforms it does are far greater than using an ISA which may FTZ. It needs -ffast-math because any FP reductions necessarily have their execution order shuffled, due to executing some of them in parallel and reducing to scalar at the end. Therefore the LV doesn’t need to be changed - it will only work when “fast” is given and will only emit “fast” vector instructions.

The SLP vectoriser however should theoretically take non-fast scalars and produce non-fast vectors. Similarly people will hand-write vector IR, or generate it from other frontends.

Because of this, I think it’s important that we shouldn’t change the semantics of the IR currently. Making vector IR targeting ARM produce scalar instructions unless a modifier is given will undoubtedly cause problems down the line with frontends being out of sync or not being updated. Even worse, the symptom of this would just be “LLVM produces poor code for ARM” / “LLVM’s vector codegen is terrible for ARM” - performance errata and not conformance. That’s why I think changing to a full-strict-by-default approach would be bad for the project. It would also violate the principle of least surprise - I wrote vector instructions and picked a vector ISA… but they’re being scalarized?

My experience is that the number of people who care about pull IEEE compatibility on ARMv7 hardware is limited, and the set of people who care about exact ULP constraints even more limited. I think we absolutely should make a solution that solves PR16274, but I think it would have to be opt-in, not opt-out.

James

The loop vectorizer does indeed require -ffast-math, but the IEEE-nonconformant transforms it does are far greater than using an ISA which may FTZ. It needs -ffast-math because any FP reductions necessarily have their execution order shuffled, due to executing some of them in parallel and reducing to scalar at the end. Therefore the LV doesn’t need to be changed - it will only work when “fast” is given and will only emit “fast” vector instructions.

Good point. This seems to be a much more rigorous definition in the
new 2008 standard. Right now, the loop vectorizer produces vector code
without -ffast-math. Are you saying we should disable it altogether
for all architectures that claim to follow the new standard?

Inner loops can be "vectorized" by SLP using only VFP instructions.

The implementation seem to have moved to Inst->hasUnsafeAlgebra(), so
we may need to return false in the legalization phase if the flag is
omitted and any instruction has unsafe algebra.

The SLP vectoriser however should theoretically take non-fast scalars and produce non-fast vectors. Similarly people will hand-write vector IR, or generate it from other frontends.

We can't guarantee the semantics of the unsafe-math flag in any IR
that was not generated by a front-end which knows about it. So, it
follows that we'll stop vectorizing their basic blocks, and there
could be some outcry. We need some general consensus if that's what
people want. I don't think we do.

Because of this, I think it’s important that we shouldn’t change the semantics of the IR currently. Making vector IR targeting ARM produce scalar instructions unless a modifier is given will undoubtedly cause problems down the line with frontends being out of sync or not being updated. Even worse, the symptom of this would just be “LLVM produces poor code for ARM” / “LLVM’s vector codegen is terrible for ARM” - performance errata and not conformance. That’s why I think changing to a full-strict-by-default approach would be bad for the project.
It would also violate the principle of least surprise - I wrote vector instructions and picked a vector ISA… but they’re being scalarized?

Right, this is opposing to marking an instruction with unsafe by
default (ie my second option). If that's so, I agree with you that
it's not trivial and may create more problems than it solves.

Hand written IR, inline ASM and intrinsics should remain for what they
are. So 16274 is probably a "won't fix"?

My experience is that the number of people who care about pull IEEE compatibility on ARMv7 hardware is limited, and the set of people who care about exact ULP constraints even more limited. I think we absolutely should make a solution that solves PR16274, but I think it would have to be opt-in, not opt-out.

And I'm guessing this is related to SLP and others. If so, I agree.

So,

For 16275, the fix is to disable loop vect. for no-fast-math + hasUnsafeAlgebra.

For 16274, disabling NEON emission in SLP would be one way, but we
must avoid any fiddling with inline asm and intrinsics, so I don't
think we should be doing that in any generic way. Certainly not
related to the example, from IR to instruction.

Makes sense?

--renato

Sorry, on phone so cherry picking what I reply to :

I agree. FZ is usually relatively benign (it only causes major problems when programs expect x != y to imply that x - y != 0, an axiom of floating-point that’s broken in FZ). Re-association more frequently causes significant instability.

I think it’s reasonable for unsafeAlgebra to imply "FZ is an allowed mode”.

– Steve

No. But I also don't want to disable the vectorizer for integer
arithmetic. I'm guessing hasUnsafeAlgebra is not just for FZ but also
NaNs and Infs, so disabling the vectorization of loops that have any
of those unless safe-math is chosen seems simple enough to me.

cheers,
--renato

The conditions in which the LV kicks in are different for FP and integer loops. The LV always kicks in for non-FP loops AFAIK

From: "James Molloy" <James.Molloy@arm.com>
To: "Renato Golin" <renato.golin@linaro.org>
Cc: "Nadav Rotem" <nrotem@apple.com>, "Arnold Schwaighofer" <aschwaighofer@apple.com>, "Hal Finkel"
<hfinkel@anl.gov>, "LLVM Dev" <llvm-dev@lists.llvm.org>, "nd" <nd@arm.com>
Sent: Monday, February 8, 2016 3:35:26 PM
Subject: Re: Vectorization with fast-math on irregular ISA sub-sets

The conditions in which the LV kicks in are different for FP and
integer loops. The LV always kicks in for non-FP loops AFAIK

Yes, and generically speaking, it does for FP loops as well (except, as has been noted, when there are FP reductions).

It seems like we need two things here:

1. Use our backend fast-math flags during instruction selection to scalarize vector instructions that don't have the right allowances (on targets where that's necessary)

2. Update the TTI cost model interfaces to take fast-math flags so that all vectorizers can make appropriate decisions

-Hal

From: "Stephen Canon via llvm-dev" <llvm-dev@lists.llvm.org>
To: "James Molloy" <James.Molloy@arm.com>
Cc: "LLVM Dev" <llvm-dev@lists.llvm.org>, "nd" <nd@arm.com>
Sent: Monday, February 8, 2016 1:44:23 PM
Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

>
> Sorry, on phone so cherry picking what I reply to :
>
>>
>> For 16275, the fix is to disable loop vect. for no-fast-math +
>> hasUnsafeAlgebra.
>
> Do you think there is a set of people that care about IEEE accuracy
> in so far that they don't want FTZ, but *are* happy to reassociate
> FP operations? That seems fairly niche to me?

I agree. FZ is usually relatively benign (it only causes major
problems when programs expect x != y to imply that x - y != 0, an
axiom of floating-point that’s broken in FZ). Re-association more
frequently causes significant instability.

I think it’s reasonable for unsafeAlgebra to imply "FZ is an allowed
mode”.

FWIW, as currently formulated, our unsafeAlgebra flag implies all others:

  void setUnsafeAlgebra() {
    Flags |= UnsafeAlgebra;
    setNoNaNs();
    setNoInfs();
    setNoSignedZeros();
    setAllowReciprocal();
  }

-Hal

Yes, and generically speaking, it does for FP loops as well (except, as has been noted, when there are FP reductions).

Right, and I think that's the problem, since a series of FP inductions
could converge to a different value in NEON or VFP, basically acting
like a n-wise reduction. Since we can't (yet?) prove there isn't a
series of operations with the same data, we have to treat them as
unsafe for non-IEEE FP operations.

It seems like we need two things here:

1. Use our backend fast-math flags during instruction selection to scalarize vector instructions that don't have the right allowances (on targets where that's necessary)
2. Update the TTI cost model interfaces to take fast-math flags so that all vectorizers can make appropriate decisions

I think this is exactly the opposite of what James is saying, and I
have to agree with him, since this would scalarise everything.

If the scalarisation is in IR, then any NEON intrinsic in C code will
get wrongly scalarised. Builtins can be lowered in either IR
operations or builtins, and the back-end has no way of knowing the
origin.

If the scalarization is lower down, then we risk also changing inline
ASM snippets, which is even worse.

James' idea on this one is to have an additional flag to *enable* such
scalarisation when the user cares too much about it, which I also
think it's a better idea than to make that the default behaviour.

cheers,
--renato

From: "Renato Golin" <renato.golin@linaro.org>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "James Molloy" <James.Molloy@arm.com>, "Nadav Rotem" <nrotem@apple.com>, "Arnold Schwaighofer"
<aschwaighofer@apple.com>, "LLVM Dev" <llvm-dev@lists.llvm.org>, "nd" <nd@arm.com>
Sent: Tuesday, February 9, 2016 3:38:20 AM
Subject: Re: Vectorization with fast-math on irregular ISA sub-sets

> Yes, and generically speaking, it does for FP loops as well
> (except, as has been noted, when there are FP reductions).

Right, and I think that's the problem, since a series of FP
inductions
could converge to a different value in NEON or VFP, basically acting
like a n-wise reduction. Since we can't (yet?) prove there isn't a
series of operations with the same data, we have to treat them as
unsafe for non-IEEE FP operations.

> It seems like we need two things here:
>
> 1. Use our backend fast-math flags during instruction selection to
> scalarize vector instructions that don't have the right
> allowances (on targets where that's necessary)
> 2. Update the TTI cost model interfaces to take fast-math flags so
> that all vectorizers can make appropriate decisions

I think this is exactly the opposite of what James is saying, and I
have to agree with him, since this would scalarise everything.

No, it just means that the intrinsics need to set the appropriate fast-math flags on the instructions generated. This might require some frontend enablement work, so be it.

There might be a slight issue with legacy IR bitcode, but if that's going to be a problem in practice, we can design some scheme to let auto-upgrade do the right thing.

If the scalarisation is in IR, then any NEON intrinsic in C code will
get wrongly scalarised. Builtins can be lowered in either IR
operations or builtins, and the back-end has no way of knowing the
origin.

If the scalarization is lower down, then we risk also changing inline
ASM snippets, which is even worse.

Yes, but we don't do that, so that's not a practical concern.

James' idea on this one is to have an additional flag to *enable*
such
scalarisation when the user cares too much about it, which I also
think it's a better idea than to make that the default behaviour.

The --stop-pretending-to-be-IEEE-compliant-when-not-really flag? :wink: I don't think that's a good idea.

To be fair, our IR language reference does not actually say that our floating-point arithmetic is IEEE compliant, but it is implied, and frontends depend on this fact. We really should not change the IR floating-point semantics contract over this. It might require some user education, but that's much better than producing subtly-wrong results.

We have a pass-feedback mechanism, I think it would be very useful for compiling with -Rpass-missed=loop-vectorize and/or -Rpass-analysis=loop-vectorize helpfully informed users that compiling with -ffast-math and/or -ffinite-math-only and -fno-signed-zeros would allow the loop to be vectorized for the targeted hardware.

-Hal

If the scalarisation is in IR, then any NEON intrinsic in C code will
get wrongly scalarised. Builtins can be lowered in either IR
operations or builtins, and the back-end has no way of knowing the
origin.

If the scalarization is lower down, then we risk also changing inline
ASM snippets, which is even worse.

Yes, but we don't do that, so that's not a practical concern.

The IR scalarisation is, though.

To be fair, our IR language reference does not actually say that our floating-point arithmetic is IEEE compliant, but it is implied, and frontends depend on this fact. We really should not change the IR floating-point semantics contract over this. It might require some user education, but that's much better than producing subtly-wrong results.

But we lower a NEON intrinsic into plain IR instructions.

If I got it right, the current "fast" attribute is "may use non IEEE
compliant", emphasis on the *may*.

As a user, I'd be really angry if I used "float32x4_t vaddq_f32
(float32x4_t, float32x4_t)" and the compiler emitted four VADD.f32 SN.

Right now, Clang lowers:
  vaddq_f32 (a, b);

to:
  %add.i = fadd <4 x float> %a, %b

which lowers (correctly) to:
  vadd.f32 q0, q0, q1

If, OTOH, "fast" means "*must* select the fastest", then we may get
away with using it.

So, your proposal seems to be that, while lowering NEON intrinsics,
Clang *always* emit the "fast" attribute for all FP operations, and
that such scalarisation phase would split *all* non-fast FP operations
if the target has non-IEEE-754 compliant SIMD.

James' proposal is to not vectorise loops if an IEE-754 compliant SIMD
is not on, and to only generate VFP instructions in the SLP
vectoriser. If we're not generating the large vector operations in the
first place, why would we need to scalarise them?

If we do vectorise to SIMD and then later scalarise, wouldn't that
change the cost model? Wouldn't it be harder to predict performance
gains, given that our cost model is only approximate and very
empirical?

Other front-ends should produce "valid" (target-specific) IR in the
first place, no? Hand generated broken IR is not something we wish to
support either, I believe.

We have a pass-feedback mechanism, I think it would be very useful for compiling with -Rpass-missed=loop-vectorize and/or -Rpass-analysis=loop-vectorize helpfully informed users that compiling with -ffast-math and/or -ffinite-math-only and -fno-signed-zeros would allow the loop to be vectorized for the targeted hardware.

That works for optimisations, not for intrinsics. Since we use the
same intermediate representation for both, we can't assume anything.

cheers,
--renato

From: "Renato Golin" <renato.golin@linaro.org>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "James Molloy" <James.Molloy@arm.com>, "Nadav Rotem" <nrotem@apple.com>, "Arnold Schwaighofer"
<aschwaighofer@apple.com>, "LLVM Dev" <llvm-dev@lists.llvm.org>, "nd" <nd@arm.com>
Sent: Wednesday, February 10, 2016 8:30:50 AM
Subject: Re: Vectorization with fast-math on irregular ISA sub-sets

>> If the scalarisation is in IR, then any NEON intrinsic in C code
>> will
>> get wrongly scalarised. Builtins can be lowered in either IR
>> operations or builtins, and the back-end has no way of knowing the
>> origin.
>>
>> If the scalarization is lower down, then we risk also changing
>> inline
>> ASM snippets, which is even worse.
>
> Yes, but we don't do that, so that's not a practical concern.

The IR scalarisation is, though.

> To be fair, our IR language reference does not actually say that
> our floating-point arithmetic is IEEE compliant, but it is
> implied, and frontends depend on this fact. We really should not
> change the IR floating-point semantics contract over this. It
> might require some user education, but that's much better than
> producing subtly-wrong results.

But we lower a NEON intrinsic into plain IR instructions.

If I got it right, the current "fast" attribute is "may use non IEEE
compliant", emphasis on the *may*.

As a user, I'd be really angry if I used "float32x4_t vaddq_f32
(float32x4_t, float32x4_t)" and the compiler emitted four VADD.f32
SN.

Right now, Clang lowers:
  vaddq_f32 (a, b);

to:
  %add.i = fadd <4 x float> %a, %b

which lowers (correctly) to:
  vadd.f32 q0, q0, q1

If, OTOH, "fast" means "*must* select the fastest", then we may get
away with using it.

So, your proposal seems to be that, while lowering NEON intrinsics,
Clang *always* emit the "fast" attribute for all FP operations, and
that such scalarisation phase would split *all* non-fast FP
operations
if the target has non-IEEE-754 compliant SIMD.

To be clear, I'm recommending that you add flags like nnan, ninf and nsz. However, I think that I've changed my mind: This won't work for the intrinsics. The flags are defined as:

  nsz
  No Signed Zeros - Allow optimizations to treat the sign of a zero argument or result as insignificant.

  nnan
  No NaNs - Allow optimizations to assume the arguments and result are not NaN. Such optimizations are required to retain defined behavior over NaNs, but the value of the result is undefined.

  ninf
  No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. Such optimizations are required to retain defined behavior over +/-Inf, but the value of the result is undefined.

and this is not right for the intrinsics-generated IR. The problem is that, for intrinsics, the users get the assume the exact semantics provided by the underlying machine instructions. By using intrinsics, the user is not telling the compiler it can do arbitrary things with the sign bit on zeros and all of the bits when given an NaN/Inf input. Rather, the user expects very specific (non-IEEE) behavior.

I think we have two options here:

1. Lower these intrinsics into target-level intrinsics

2. Add flags (or something like that) that indicate the alternate non-IEEE semantics that ARM actually provides.

I suspect that (1) will cause performance regressions (since we don't optimize the intrinsics as well as the generic IR we previously generated), so we should investigate (2).

James' proposal is to not vectorise loops if an IEE-754 compliant
SIMD
is not on, and to only generate VFP instructions in the SLP
vectoriser. If we're not generating the large vector operations in
the
first place, why would we need to scalarise them?

We should indeed let the cost model reflect the scalarization cost in cases where we need IEEE semantics.

If we do vectorise to SIMD and then later scalarise, wouldn't that
change the cost model? Wouldn't it be harder to predict performance
gains, given that our cost model is only approximate and very
empirical?

We'd need to pass the fast-math flags to the cost model so that we'd get costs back that depended on whether or not we could actually use the vector instructions.

-Hal

Rather, the user expects very specific (non-IEEE) behavior.

Precisely! :slight_smile:

I think we have two options here:

1. Lower these intrinsics into target-level intrinsics

That's not an option for the reasons you outline (performance), but
also because this would explode the number of intrinsics we have to
deal with, making the IR *very* opaque and hard to deal with.

2. Add flags (or something like that) that indicate the alternate non-IEEE semantics that ARM actually provides.

That's my idea, but I want to think about it only when we really need
to. Adding new flags always lead us to hard choices, and backwards
compatibility will be a problem here.

We'd need to pass the fast-math flags to the cost model so that we'd get costs back that depended on whether or not we could actually use the vector instructions.

Indeed, that's the only way. But I foresee the cost model at least
doubling its complexity for those unfortunate targets. Right now, we
use heuristics to map the costs of casts, shuffles and memory
operations that normally disappear, but when loops can now use NEON
and VFP as well as scalar in the same objects, how the back-end will
emit those pseudo-operations will be anyone's guess.

In that sense, James' suggestion to create a flag for strict IEEE
semantics, locking SIMD FP out of the question entirely, is an easy
intermediate step.

cheers,
--renato

Hal,

I had a read on the ARM ARM about VFP and SIMD FP semantics and my
analysis is that NEON's only problem is the Flush-to-zero behaviour,
which is non-compliant.

NEON deals with NaNs and Infs in the way specified by the standard and
should not cause any concern to us. But we don't seem to have a flag
specifically to denormals, so I think using the UnsafeMath is the
safest option for now.

  nsz
  No Signed Zeros - Allow optimizations to treat the sign of a zero argument or result as insignificant.

In both VFP and NEON, zero signs are significant. In NEON, the
flush-to-zero's zero will have the same sign as the input denormal.

  nnan
  No NaNs - Allow optimizations to assume the arguments and result are not NaN. Such optimizations are required to retain defined behavior over NaNs, but the value of the result is undefined.

Both VFP and NEON treat NaNs as the standard requires, ie. [ NaN op ? ] = NaN.

  ninf
  No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. Such optimizations are required to retain defined behavior over +/-Inf, but the value of the result is undefined.

Same here. Operations with Inf generate Inf or NaNs on both units.

The flush-to-zero behaviour has an effect on both NaNs and Infs, since
it happens before. So a denormal operation with an Inf in VFP will not
generate a NaN, while in NEON it'll be flushed to zero first, thus
generating NaNs.

James, is that a correct assessment?

cheers,
--renato

Our processor also has some issues regarding the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago.

The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is:

  float xAbs = fabsf(x);

since we know our instruction for this does not handle denormals and the algorithm is sensitive to correct denormals, the code was written to avoid this issue as follows:

  float xAbs = __builtin_astype(__builtin_astype(x, unsigned) & 0x7FFFFFFF, float);

But the v3.8 FP optimiser now recognises this pattern and replaces it with an ISD::FABS node and broke our workaround :slight_smile: It's a great optimisation and I have no problem with its correctness, but I was thinking that perhaps I might see where I should extend the target information interface to allow a target to say that it does not support denormals so that this and possibly other optimisations could be suppressed in a target dependent way.

Overall the new FP32 optimisation patterns appear to have yielded a small but not insignificant performance advantage over v3.7.1, though it is still early days for my complete measurements.

  MartinO

Hi Martin,

So, I have a patch that right now is a big hammer:
* Targets can have SIMD IEEE compliant or not (instead of fine
grained choosing which part).
* Any FP arithmetic / cast operation with UnsafeAlgebra will trigger
a "potentially unsafe" flag in the vectorizer.
* In the end, if the SIMD unit is not IEEE compliant and there is any
potentially unsafe operations, avoid that loop.

I just need to create some more tests to submit.

The problems I can see in your case are:
* Both scalar and vector units have problems with denormals, so my
isSIMDIEEE() is not enough.
   - To fix this, you can add isVFPIEEE(), but we may find a better solution?
* Your optimisation is basic-block based, not loop based, so we'd
have to add the same check to SLP.
   - SLP deals with both SIMD and VFP units, so we would need the
additional flag anyway.
   - This will be my next step.
* Other passes already have access to the TTI, so they can use those
flags to avoid strength reduction, combine, etc. in those cases.

I don't think we need to create a fine grained solution right now,
since we don't have examples with different behaviour.

Would that work for you?

cheers,
--renato

This is fine Renato.

I worked around the local issue by using an instruction intrinsics so that the pattern would be invisible to this optimisation, and my thoughts for raising this to the TargetTransformInfo level are still not well formed. I was actually quite impressed with the new optimisation, it cleverly handled the situation perfectly.

A coarse grained solution is fine, and it is always possible to handle this in custom lowering for ISD::FABS which could check a target specific flag to see if it should do the "safe thing" or the "fast thing".

Thanks for the feedback,

  MartinO

Hi,

James, is that a correct assessment?

Yes, it is also my belief that the only way ARMv7 NEON differs from IEEE754 is lack of denormal support.

James

- ARMv7 NEON ignores the rounding mode set in bits 23:22 of FPSCR and always uses round to nearest.
- ARMv7 NEON ignores the trap enable bits (15:8) in FPSCR and always uses default exception handling.

As with denormal support, the issue at hand is not so much that these differ from IEEE 754 as it is that they differ from the behavior of the scalar (VFP) arithmetic.

- Steve