Determining if ObjCIvarDecl is in multi-declaration

Suppose I have an objective-c interface:

@interface C {
int x,y;
};

I parse it, and have an ObjCIvarDecl for x. How can I determine that x is in a multi-declaration? I could check which ObjCIvarDecls’ locations overlap, but is there an easier way?

Thanks in advance,

e

Suppose I have an objective-c interface:

@interface C {
int x,y;
};

I parse it, and have an ObjCIvarDecl for x. How can I determine that x is in a multi-declaration? I could check which ObjCIvarDecls’ locations overlap, but is there an easier way?

There is no way that I know of. Locations won’t help either because we can put ‘\n’ s between them. For properties, I intended to preserve such syntactic info. but was told it is unnecessary at this time.

  • fariborz

What else do you need to know? What action would you take based on whether it was a multi declaration or not?

Capturing this information is quite straight-forward, but it is good to have the whole picture in mind before doing something like this.

-Chris

I’m removing an instance variable. In the current code, if it’s a multi-decl, I just give up, which is what spawned this question. Ideally, I’d like to be able to remove an inst var from a multi-decl as easily as from a single decl.

Thanks,

e

Hi

I am using clang/llvm to implement a c compiler for an 8-bit microprocessor. I apologize for the lengthy email, but I think it is necessary to give a background of what I’m doing and then ask my questions.

Stack operations are not practical on this device because the memory is composed of small pieces that are not contiguous in address space. So we have to sacrifice recursion and play some tricks in llvm to implement things like reentrancy, function pointers, varargs, etc.

At a high level:

  1. Function frames are statically allocated at link time and they contain the local variables, input arguments, return address, etc.

  2. Auto variables behave like static local variables except for initialization (which must take place in the function preamble)

  3. Parameters behave like local variables except for initialization (which must take place in the caller)

  4. Global and Static local variables behave no different than standard c

Accordingly, I need to modify the front-end to generate the llvm-bitcode such that correct storage class/name-mangles are selected for variables.

I modified CGDecl.cpp in the two functions below and so far everything works fine:

CodeGenFunction::EmitParmDecl() {same as in EmitStaticBlockVarDecl(); except for initialization; also mangle the name differently}

CodeGenFunction::EmitLocalBlockVarDecl() {same as EmitStaticBlockVarDecl(); except generate the initialization code; also mangle the name differently}

Questions:

  1. Do I need to do any special housekeeping? Or the above modifications are enough?

  2. These modifications are in contrast with most microprocessors but are useful for many small devices (including the one I am working on), how should I go about incorporating these changes in the clang code base?

  3. How can I acquire access privilege to be able to checkin to the clang code base?

Thanks

Ali.

(Resend, this time with the the list on the CC list.)

Hi

I am using clang/llvm to implement a c compiler for an 8-bit microprocessor.
I apologize for the lengthy email, but I think it is necessary to give a
background of what I'm doing and then ask my questions.

Stack operations are not practical on this device because the memory is
composed of small pieces that are not contiguous in address space. So we
have to sacrifice recursion and play some tricks in llvm to implement things
like reentrancy, function pointers, varargs, etc.

Hmm, so you're statically allocating everything everything, at the
expense of not being able to support recursion? Interesting. I don't
think there are any existing LLVM ports to an architecture that don't
have a stack.

I don't think it's possible to enforce a "no recursion" rule using a
conventional C linker, so I guess you'd have to be careful when
writing code for an architecture like this. (I guess you could use a
runtime check so the program doesn't silently fail.)

At a high level:

1) Function frames are statically allocated at link time and they contain
the local variables, input arguments, return address, etc.

2) Auto variables behave like static local variables except for
initialization (which must take place in the function preamble)

3) Parameters behave like local variables except for initialization (which
must take place in the caller)

4) Global and Static local variables behave no different than standard c

Right... makes sense.

Accordingly, I need to modify the front-end to generate the llvm-bitcode
such that correct storage class/name-mangles are selected for variables.

I modified CGDecl.cpp in the two functions below and so far everything works
fine:

CodeGenFunction::EmitParmDecl() {same as in EmitStaticBlockVarDecl(); except
for initialization; also mangle the name differently}

CodeGenFunction::EmitLocalBlockVarDecl() {same as EmitStaticBlockVarDecl();
except generate the initialization code; also mangle the name differently}

I doubt this is really the best approach for doing this because of the
way LLVM works... the optimizers are much more effective when local
variables are allocas. Also, this doesn't get rid of all the stuff
that would normally be allocated on the stack: sometimes, registers
have to be spilled onto the stack.

Have you actually tried writing an LLVM backend for your processor
yet? It should be straightforward to add alloca support once you have
everything else working.

2) These modifications are in contrast with most microprocessors but are
useful for many small devices (including the one I am working on), how
should I go about incorporating these changes in the clang code base?

If you have patches, just send them to this list (just use "svn diff").

3) How can I acquire access privilege to be able to checkin to the clang
code base?

Don't worry about it; someone else can commit your patches for you.

-Eli

Let me summarize...
(patches)
Ok, that makes sense to have the patches applied by someone else,
however, my patches will surely break other peoples work. I'm just
curious, how are you going to merge them? Conditional compiling?
Commandline flags?

(Enforcing no recursion)
We are also implementing our own linker. In fact not only the recursion
thing, but also many other features can't work without a custom linker;
I can go over some details separately if you are interested.

(llvm backend)
We have the code generation in llvm, although it is not complete, it is
just enough to prove the concept. The register spilling and other stack
dependent things will be handled in an unconventional way on the
function frame (the memory locations will be reserved at link time). The
linker can even decide if it can reuse (overlay) some of these frames
depending on how call graph looks like.

(alloca)
We tried code generation in llvm by replacing alloca with global
addresses however, (1) the name of local variable is not as easily
available, (2)I don't feel quite comfortable with having the storage
class of variable change at the last phase of translation. It sounds
more reasonable to get it right in the front-end to begin with.

Regards,
Ali

Let me summarize...
(patches)
Ok, that makes sense to have the patches applied by someone else,
however, my patches will surely break other peoples work. I'm just
curious, how are you going to merge them? Conditional compiling?
Commandline flags?

You don't need to worry about it. Constantly remerging patchsets to
trunk is routine for anyone doing serious work in a large project like
this.

(Enforcing no recursion)
We are also implementing our own linker. In fact not only the recursion
thing, but also many other features can't work without a custom linker;
I can go over some details separately if you are interested.

Oh, okay. Makes sense.

(llvm backend)
We have the code generation in llvm, although it is not complete, it is
just enough to prove the concept. The register spilling and other stack
dependent things will be handled in an unconventional way on the
function frame (the memory locations will be reserved at link time). The
linker can even decide if it can reuse (overlay) some of these frames
depending on how call graph looks like.

(alloca)
We tried code generation in llvm by replacing alloca with global
addresses however,
(1) the name of local variable is not as easily
available,

Why does this matter? I can't imagine that it's important to preserve
the names of local variables.

(2)I don't feel quite comfortable with having the storage
class of variable change at the last phase of translation. It sounds
more reasonable to get it right in the front-end to begin with.

Hmm, my idea was that you'd allocate the memory for the alloca's along
with the space for spills and whatnot. I'm actually a bit surprised it
doesn't just work without any special changes; fixed-size alloca's
should be part of the stack layout, and therefore transparent to the
target. That said, I don't know LLVM codegen that well. I'd suggest
sending an email to LLVMdev if you're having trouble making it work.

If you really have to get rid of the allocas before codegen, I'd
suggest writing an llvm pass that converts the allocas to globals. It
shouldn't be very hard; all you have to do is for every alloca, create
an equivalent global, then replace all references to the alloca with
references to the global. It doesn't seem appropriate to attempt to
eliminate all allocas in the front-end; it limits your potential
flexibility with respect to other languages targeting LLVM, it makes
the code more difficult to optimize, and in theory an optimization
pass could introduce allocas (although I don't know of any such pass).

-Eli

Okay, so you need more information, or some amount of fuzziness to do this. Specifically, if you have:

int *X, ***Y[10];

Even if you know that X/Y are multi-declaration, you’ll have to have some way of handling the *'s and the array suffix. Because types are unique’d, they don’t have location information. Handling this sort of thing will require some amount of fuzzy editting: could fuzzy editting be used to scan for the comma also?

-Chris

Incidentally, you can also reuse the Lexer to do this for you (relex the tokens), so you don’t have to use strchr on the input buffer. If you do however, you can get a char* for a SourceLocation with SourceManager::getCharacterData(SourceLocation Loc)

-Chris

Let me summarize...
(patches)
Ok, that makes sense to have the patches applied by someone else,
however, my patches will surely break other peoples work. I'm just
curious, how are you going to merge them? Conditional compiling?
Commandline flags?

The best way to do this is to add a target to clang for your architecture, and conditionalize the codegen aspects on your target being active. This means you'll be able to compile with 'clang t.c -arch myarch'.

(alloca)
We tried code generation in llvm by replacing alloca with global
addresses however, (1) the name of local variable is not as easily
available, (2)I don't feel quite comfortable with having the storage
class of variable change at the last phase of translation. It sounds
more reasonable to get it right in the front-end to begin with.

Using global variables should work reasonably well. Just make sure they are marked as 'internal'. The 'globalopt' pass will do SROA and other optimizations on them to help eliminate them. I imagine that there are improvements we could also make, but there is nothing "in principle" that would prevent this from working.

-Chris

I'm not sure what you mean by fuzzy editing. I'll try using the lexer to determine if there is a comma between the type at the semicolon.

Thanks,

e

Ok, so my lexing code kinda works, except that commas can also appear in instance variables that are function pointers. And probably other wacky c corners that I don't know about.

Anyway, a related question: can I get the start and end source location for the instance variable statement, not just the instance variable name itself? E.g., the whole SourceRange of "int a;", not just of "a".

Thanks,

e

Sure, in a function, the decl will be wrapped in a declstmt, and the declstmt itself will have a range for the whole declaration.

-Chris

Thank you for your reply,
Attached please find the patch I made from my latest update of clang
from Friday. This is my first time to submit patch to this group, so
please let me know if there is any problem.

Thanks
Ali

patch-svn.txt (6.9 KB)

I created the patch for my target specific modifications and sent it to
cfe-commits. Since this is my first time to send a patch I don't know if
I have submitted my changes to the right place or not and of course what
is the turnaround time.
I also have attached it to this email just in case.
Please let me know if I have to do it differently.

Thanks
Ali

patch-svn.txt (6.9 KB)

I created the patch for my target specific modifications and sent it to
cfe-commits. Since this is my first time to send a patch I don't know if
I have submitted my changes to the right place or not and of course what
is the turnaround time.

You did exactly the right thing. I've been bogged down with other things lately and haven't had much time to stay on top of clang, this will hopefully be fixed next week, I apologize for the delay.

I also have attached it to this email just in case.
Please let me know if I have to do it differently.

The patch looks great. Some specific comments:

    /// getPointerWidth - Return the width of pointers on this target, for the
    /// specified address space. FIXME: implement correctly.
- uint64_t getPointerWidth(unsigned AddrSpace) const { return 32; }
- uint64_t getPointerAlign(unsigned AddrSpace) const { return 32; }
+ virtual uint64_t getPointerWidth(unsigned AddrSpace) const { return 32; }
+ virtual uint64_t getPointerAlign(unsigned AddrSpace) const { return 32; }

    /// getIntWidth/Align - Return the size of 'signed int' and 'unsigned int' for
    /// this target, in bits.
- unsigned getIntWidth() const { return 32; } // FIXME
- unsigned getIntAlign() const { return 32; } // FIXME
+ virtual unsigned getIntWidth() const { return 32; } // FIXME
+ virtual unsigned getIntAlign() const { return 32; } // FIXME

Instead of making these virtual, please add instance variables for these like double and wchar are handled. You can also remove the FIXMEs. Thanks for doing this.

+++ lib/Basic/Targets.cpp (working copy)
@@ -863,6 +863,28 @@
+class PIC16TargetInfo : public TargetInfo{
+public:
+ virtual const char *getVAListDeclaration() const { return "";}
+ virtual const char *getClobbers() const {return "";}
+ virtual const char *getTargetPrefix() const {return "";}
+ virtual void getGCCRegNames(const char * const *&Names, unsigned &NumNames) const {}
+ virtual bool validateAsmConstraint(char c, TargetInfo::ConstraintInfo &info) const {return true;}
+ virtual void getGCCRegAliases(const GCCRegAlias *&Aliases, unsigned &NumAliases) const {}
+};
+}

Please make sure the code fits in 80 columns.

+++ lib/CodeGen/CGDecl.cpp (working copy)
@@ -15,6 +15,7 @@

+ if (strncmp (this->Target.getTargetTriple(), "pic16-", 6) == 0) {
+ const llvm::Type *LTy = CGM.getTypes().ConvertTypeForMem(Ty);

The preferred way to do a target check like this is to add some new property to TargetInfo with an accessor like "Target.useGlobalsForAutomaticVariables()" or something like that. PIC16 can return true, all other targets return false.

Can you just use the code path for static variables to handle the LLVM IR emission? That would avoid duplicating the code.

-Chris

Thank you Chris,
I'll apply the changes and resubmit a new patch.

Regards.
Ali