Type annotations for libclang, Python bindings

Hi there o/

I have recently used the libclang python bindings to write a static code analysis tool. In the course of that project, I noticed that there are no annotations for the Python bindings.

I personally find type annotations very useful, since they help you in detecting many types of errors via type checkers (mostly the stuff Python is bad at due to no static typing) but also with e.g. auto completion, suggestions and hints in your IDE.

Because of this, I’m currently working on adding typing to the point that cindex.py passes a strict mypy check. However, there’s a couple points that I’m unsure about.
For one thing, I didn’t find any information about which versions of Python the bindings are supposed to support, and as such I’m not sure if it’s even possible to use type hints / which features from the typing module I’m allowed to use etc.
Some places need deeper changes than just a simple type annotation to make the type checker accept it, so I’m also wondering if there’s enough interest in these for someone to actually review a change that gets this to pass a strict type-check. Alternatively, one could annotate only the actual “outward” interface of the module, which would need less changes… but might still produce a bunch of typing errors for all the places that weren’t touched for anyone using this.

Anyway, I just want to make sure I don’t put too much work into something that ends up useless because it’s not compatible or something.

2 Likes

At one point, we made a push to move our Python to Python 3, but that was for internal scripts. The libclang Python bindings are a bit different in that those are closer to “user-facing” than our internal stuff. I’d say we should aim for Python 2 and Python 3 support if possible, but I’d say we should bias towards Python 3 if necessary. (No idea how others feel about it, though.)

I think the trick will be finding someone interested and qualified to review those changes. Hopefully someone can volunteer on this thread (I know enough about libclang to help, but I don’t know enough about Python to feel comfortable signing off on the changes).

Python 2 has been EOL’ed for 3 years, I think it’s completely reasonable to stop supporting it.

4 Likes

I wouldn’t be opposed, though I wonder whether EOL != out of common use in this case.

Ubuntu has dropped it as of 22.10, and Debian has dropped it entirely for all future releases (starting with this year’s 12 “bookworm”), including from unstable itself. Fedora still seems to have it in 38 and Rawhide, and openSUSE still seems to have it in 15.4 Leap and Tumbleweed.

Please just let Python 2 die. It’s not a good use of people’s scarce time to support it. Typing especially (benefits pointed out in the OP) made huge strides in Python in the last few years, and is incompatible with Python 2.

3 Likes

As far as I can tell, Python 3.6 is safe to use. There was an RFC about 3.8, but it seems that consensus hasn’t been reached yet.

I have rather extensive experience with both typing and mypy, so I can review the patches.

3 Likes

It sounds to me like the consensus position is to not bother supporting Python 2 and only support Python 3. SGTM, thank you all for the discussion!

1 Like

Thank you for that offer!

Thanks a lot for all the helpful feedback here! And especially to @Endill for offering to review.
I’ll make sure to test my changes against Python 3.6 specifically, and hope I can provide a patch sometime soon.

Hi all, I have made some progress on removing Python 2 and adding Python 3 type annotation. I’m not sure if it is ready to be merged upstream, yet, but it is available for anyone looking for extra type annotations.

Hi all, I opened an issue to implement type annotations. Please check it out and add any extra thoughts over there!

I just opened a PR for this, that type annotates cindex.py to the point where it passes a strict mypy check. I’d be happy about any feedback.

@Endill sorry about the late reply, but if you’d still like to review this, you can find the PR here:

Sorry @DeinAlptraum, I wasn’t trying to step on your toes here. I though you may have moved on from this project, so I tried to push this forward.

Given backwards compatibility is one of the selling features for libclang (compared to clang AST…) and type annotations are difficult to make backwards compatible, I have been adding CI to check this.

I also documented all type annotation features we can and can’t use for version 3.7 here.

I currently have a conflicting incremental PR [libclang/python] Bump minimum compatibility to Python 3.6 by linux4life798 · Pull Request #77228 · llvm/llvm-project · GitHub up for review. Would you consider waiting for it to be reviewed/submitted and rebasing?

Best,

Craig

1 Like

Hi Craig,

no that’s totally fair, I’ve been working on-and-off on this for months and wasn’t sure I’d complete it myself tbh. I remember seeing your message on this thread but then half-forgot about it.
What you say seems reasonable, so I’ve set my PR back to draft for now. Thanks for your work!

Best,
Jannick

1 Like

The upgrade to Python 3.8 is finally through, and my PR open for review.

@Endill if you’re still up for this (and ofc others in this thread as well), I would appreciate a review/feedback: [libclang/python] Add strict typing to clang Python bindings (#76664) by DeinAlptraum · Pull Request #78114 · llvm/llvm-project · GitHub

I know this is pretty big, if there’s any questions or anything I can do to make the review easier, please do send me a messge

Have folk considered using pybind11 for Clang’s Python bindings? This would give us type hints for free and would get round some of the ctypes-induced incompatibility issues mentioned in the existing python bindings tests.

MLIR already uses pybind11 in mlir/lib/Bindings/Python/MainModule.cpp

To my understanding, pybind11 aims at providing Python wrappers for C++ interfaces, but libclang is a C interface already. So its Python bindings are much thinner and less complicated by design. I don’t think pybind11 would improve anything.

Worth noting that LLDB has been using SWIG for their Python bindings, and I don’t think that rewriting that in pybind11 would make it more maintainable due to enormous API surface LLDB provides. The fact that SWIG can’t provide type hints (yet) is unfortunate, though.

Assortment of binding technologies in our community is nothing new, and agreeing on a single technology doesn’t seem important to me.

1 Like