[RFC] Intrinsic naming convention (words with dots)

Hi everyone,

We seem to have allowed our documented target-independent intrinsics to acquire a somewhat-haphazard naming system, and I think we should standardize on one convention. All of the intrinsics have 'llvm.' as a prefix, and some also have some additional prefix 'llvm.dbg.', 'llvm.eh.', 'llvm.experimental.', etc., but after that we lose consistency. When there is just a single word (or acronym) everything is fine, but the way we join multiple words (or acronyms) falls into three categories:

1. No separator (e.g. @llvm.readcyclecounter)
2. Using '.' as a separator (e.g. @llvm.sadd.with.overflow)
3. Using '_' as a separator (e.g. @llvm.read_register)

I propose that we standardize on (2) -- words with dots -- as it seems to have a plurality of more-recent intrinsics (and I think it is easy to read, as is (3)). Thoughts?

Although this is somewhat subjective, here's our current set of intrinsics with multiple words (or acronyms) by these categories. I'm excluding here externally-defined terms (e.g. llvm.va_start):

No separators (except for the initial namespace prefix):

@llvm.gcroot
@llvm.gcread
@llvm.gcwrite

@llvm.experimental.stackmap
@llvm.experimental.patchpoint

@llvm.experimental.gc.statepoint

@llvm.returnaddress
@llvm.frameaddress

@llvm.localescape
@llvm.localrecover

@llvm.stacksave
@llvm.stackrestore

@llvm.pcmarker
@llvm.readcyclecounter

@llvm.bitreverse

@llvm.eh.begincatch
@llvm.eh.endcatch

@llvm.eh.padparam

@llvm.stackprotector
@llvm.stackprotectorcheck
@llvm.objectsize

@llvm.donothing

Words with dots:

@llvm.sadd.with.overflow
@llvm.uadd.with.overflow
@llvm.ssub.with.overflow
@llvm.usub.with.overflow
@llvm.smul.with.overflow
@llvm.umul.with.overflow

@llvm.convert.to.fp16
@llvm.convert.from.fp16

@llvm.eh.typeid.for

@llvm.init.trampoline
@llvm.adjust.trampoline

@llvm.masked.load
@llvm.masked.store

@llvm.masked.gather
@llvm.masked.scatter

@llvm.lifetime.start
@llvm.lifetime.end

@llvm.invariant.start
@llvm.invariant.end
@llvm.invariant.group.barrier

@llvm.var.annotation
@llvm.ptr.annotation

@llvm.bitset.test

Words with underscores (except for the initial namespace prefix):

@llvm.read_register
@llvm.write_register

@llvm.clear_cache

@llvm.instrprof_increment
@llvm.instrprof_value_profile

Thanks again,
Hal

SGTM.

Thanks!

-eric

How about using dots to separate "contexts" and underscores to separate words, e.g.

llvm.gc.* --stuff related to GC
llvm.gc.read
llvm.gc.do_something_else

-Krzysztof

From: "Eric Christopher" <echristo@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Tuesday, December 1, 2015 11:03:05 AM
Subject: Re: [llvm-dev] [RFC] Intrinsic naming convention (words with dots)

SGTM.

Thanks!

Follow-up question: Once we decide on a convention, should we:

1. Just document it, leave existing things as-is, but make all new intrinsics comply with the convention.

2. Update all existing intrinsic names to follow the naming convention (with auto-upgrade for bitcode as necessary).

3. If we do (2), does that constitute an ABI break at the C level unless special provisions are made?

Thanks again,
Hal

Krzysztof Parzyszek via llvm-dev <llvm-dev@lists.llvm.org> writes:

That's what I thought also when reading the proposal. I always thought of dots in intrinsic names as namespace separators, which doesn't always conform with actual usage.

-Manuel

My concern with this proposal is that the process that generates the C++ enum values transforms dots into underscores. Mixing dots and underscores in the IR seems really bad because there are then multiple possible IR values for any given C++ value. I’d much prefer that we remove the existing users to underscores and make it explicit that dot in IR means underscore in C++.

David

So then that leaves camelCase for words in a context…

This idea SGTM, using `.` as a namespace (and otherwise using `_`).

I’m fine with “words with dots” or “dots are namespaces and underscores separate parts of words”. If you’re really on board with doing the autoupgrade logic from the old names, then I slightly prefer dots for namespaces.

-Chris

This proposal - dots as namespaces, underscore for words - would be my preferred scheme, but I really don't have much of a strong preference. Any reasonable scheme which is documented and consistent works for me.

Philip

Yep :slight_smile:

And if it autoupgrades then it’s fairly easy to change anyhow.

-eric

Dear Hal,

The current rule for an intrinsic, IIRC, is llvm.<str> where <str> is some arbitrary name that is allowed within an LLVM function name.

While I can understand the desire for consistency, I think what you suggest is a purely aesthetic change with no real value. If you want to spend your time on aesthetics, that's fine with me, but you're introducing changes to the LLVM assembler, disassembler, and documentation to do it. It may also cause issues with in-tree and out-of-tree test suites that grep for intrinsic names in LLVM assembly output (not sure how many tests do that, but it's possible).

Personally, I wouldn't spend my time on it, but that's just me.

FWIW,

John Criswell