Would a spreadsheet be a good project using LLVM?

I am thinking about writing a new open source spreadsheet application since I think the spreadsheet applications out there (Microsoft Excel, LibreOffice Calc, etc.) do not have some features that I would really like to use. I would like the spreadsheet to recalculate very fast and wondered if it would make sense to use LLVM to calculate the cell values quickly. Each cell of a spreadsheet contains an expression, much like an expression in any programming language. The big difference is that the ordering of the expression evaluations is governed by the dependencies of each cell on other cells and end up being turned into a directed acyclic graph (DAG). It would be good if the actual conversion of cell expressions from strings that the user enters into a cell, would be converted into a representation that can be recalculated very quickly. I was thinking that LLVM machine code might be a good target.

Overall, is this a good project to use LLVM?

Is there existing open source code for compiling expressions using LLVM that you would recommend for this project?

Any suggestions or concerns about this approach?

Is anyone interested in helping out?

Thanks

Jason

This sounds like an interesting project, but I’m far from convinced that LLVM will help much:

I have not TRIED to solve this, but my guess is that of the time to recalculate a large spreadsheet, the majority of the time is not actually spent calculating the values of the content, but actually inferring the dependencies and other “work out what needs to be (re-)calculated”. And whilst LLVM can probably help in doing that, I’m far from convinced that it will speed up the things you actually want to speed up.

Compilers (I’m including LLVM in “compilers” for this discussing) are really best at dealing with things where you repeat the same exact thing over and over. In a spreadsheet, the common case is that you calculate all the fields ONCE, then only recalculate based on changes. Of course, some change that is used EVERYWHERE (like if you change the base-interest rate in your spreadsheet for calculating bank-rates and the earnings on various bank-accounts, then ALL your cells will depend on that in some way, and everything gets recalculated). And whilst the dependancy graph DOESN’T change (except when you change the formulae in the cells). So you’d end up spending a lot of time compiling the spreadsheet code, and probably lose time rather than gain on the competing solutions.

So my feeling, without testing it, is that LLVM probably won’t give you a huge amount of benefit in the general case of spreadsheets.

If it was my project, I think it would be worth writing something that simulates the general principle of spreadsheet calculations, and first of all, figure out where the majority of the time is spent. I don’t think this experiment needs to have full support of all the different data types, functons, etc, just the dependency and (basic) formula support.

No, I can’t back up this with any facts - just my thoughts - hopefully it’s of some help, and I haven’t just rambled away completely aimlessly.

Mats,

Thanks for your feedback. I guess I should elaborate on one of the features I would like to implement. In today's spreadsheets if you want to crunch a lot of data, you usually put the data in rows and put expressions in cells to the right of the data and repeat those cells for every row of the data. I would like to create a spreadsheet system that can process the same quantity of data but only has the a single instance of those cells with expressions. In other words, I do expect that that the same expressions get used over and over with the same data, perhaps thousands (or more) times.

Would LLVM be a good fit now?

Thanks again,

Jason

LLVM would be a good fit for a Lotus Improv / Qantrix Modeller style spreadsheet.

David

For note, LibreOffice latest version (maybe the released version now) has a substantial speed improvement via OpenCL support if enabled now. Might want to test it before writing a compiler layer, even a plugin to OpenCL your expected patterns might be far far easier.

So, I’m not sure exactly what you mean that you’d do differently from a regular spreadsheet application, but I still think that the majority of time is not in the actual calculation, but the “figure out what to do, and what order”, which is the exact same task that LLVM would do if you produced dependent expressions for it to compile/optimise - with the extra work of producing LLVM-IR and the extra work inside LLVM to optimise and generate code. I’d have though that this is not the ideal solution. I could be wrong.

Again, I would model what you are planning to do, implement it in C or C++, and see where the time goes for a relatively simple [simple n terms of completeness and support, not in terms of data and size] prototype. Once you know where the time is spent, if it looks like a compiler (JIT?) would help, then implement that and see if it really helps. Building something that produces LLVM-IR is not terribly hard, so once you have your “basic spreadsheet functionality”, modifying it to compile the expressions into LLVM-IR and then JITing that wouldn’t be that much work. I’m always optimistic in these things, but I’d say if you understand how to write a simple spreadsheet app, a week or two would for the basic test-version, and another week or two to see if LLVM helps…

My guess, however, is that this will only really benefit in a small number of cases, if at all.

Out of curiosity, have you tried Gnumeric? It's works quite fast.