I’m currently working on a query engine based on LLVM and looking for the “best” approach to map our own generic “data model” within the infrastructure.
When I do a query, lets say I have athletes and wish to apply a query like this :
age > 18 && result.speed > 15
Here is my data model, it is generic data model and a data schema that describe the ‘form’ of the data. Here are some pseudo-code :
struct Data
{
vector doubleValues;
vector intValues;
vector stringValues;
vector children;
};
enum class DataPropertyType
{
DPT_DOUBLE,
DPT_INT,
DPT_TEXT,
DPT_CHILDREN,
};
struct DataProperty
{
DataPropertyType type;
string name;
int index;
DataSchema* children = nullptr;
};
struct DataSchema
{
vector properties;
};
//------------------------------
// Construct the schema
//------------------------------
DataSchema schPerformance;
schPerformance.push_back( DataProperty {DPT_DOUBLE, “speed”, 0} );
DataSchema schPerson;
schPerson.push_back( DataProperty {DPT_INT, “age”, 0} );
schPerson.push_back( DataProperty {DPT_CHILDREN, “result”, 1, schPerformance} );
//------------------------------
// Construct one item
//------------------------------
Data performance;
performance.doubleValues[0] = 20; // speed
Data person;
person.intValues[0] = 20; // age
person.children[0] = performance; // performance
What I wish, is to use the existing ‘data schema’, predefined before the query execution, to allow LLVM to ‘map’ its JIT directly to my structure, it is all about performance.
I wish to avoid dynamic schema lookup and introspection.
The idea is to use the DataSchema and the ‘indexes’ information + type information during code generation.
Then, I have a list of ‘Data’ I which to filter, lets say :
//------------------------------
// Example of use
//------------------------------
vector datas;
… populate the datas
vector result;
foreach(Data& data : datas)
{
if ( query.filter(data) ) result.push_back(data);
}
Do you know a way, or the best way to apply such strategy ?
Thx