In MLIR, we can define an int variable like this:
%a = constant 10 : i32
Does it support a string variable (or char*) defined in a similar way? For example:
%s = constant "I am a string" : string
In MLIR, we can define an int variable like this:
%a = constant 10 : i32
Does it support a string variable (or char*) defined in a similar way? For example:
%s = constant "I am a string" : string
The standard-dialect constant operation does not support strings and there is no builtin string type (see Builtin Dialect - MLIR), but you can define a string type in your dialect and create an operation that would construct a value of that type. For instance, the TensorFlow dialect has a string type: tensorflow/tf_types.def at 104959a3051b4df05fb380588e9ef517b1a422e2 Ā· tensorflow/tensorflow Ā· GitHub (though itās got a few extra layers of macros and such in the definition). https://mlir.llvm.org/docs/Tutorials/DefiningAttributesAndTypes/ explains how to create your own type.
N.B.: The OpaqueType in the builtin dialect is represented as a string literal, but I donāt think thatās what you want.
Thank you for your info @gcmn , I appreciate the information.
More question: I want to call a runtime function in mlir call @getString(){stringvalue="aaaa"} : () -> ()
. The runtime function is implemented in C void getString() {...}
. Is there any method that I can pass the string from mlir to C, i.e. is there a method that I can get the stringvalue
attribute in the C implementation?
Thank you in advance!
Hi @rqtian, I recently faced the same problem: How to represent values of type āstringā in MLIR and how to pass them to pre-compiled C functions at run-time. Note that Iām still a learner in MLIR and LLVM, so the solution might not be perfect (and Iād indeed appreciate feedback from other users). In my case, making it work at all was the main goal, not making it work efficiently. Here is a sketch of how I did it:
String type
As @gcmn suggests, I defined a string type in my own dialect, as simple as:
def String : MyDialect_Type<"String"> {
let summary = "string";
}
How to carry strings through the IR
One option is to attach a StringAttr
to your operation, as in your comment above (but for that, you donāt even need a string type). Another option is to create a kind of StringConstantOp
in your dialect, which is mostly similar to the existing ConstantOp
, but has a StringAttr
and is of result type String
(the one defined above).
Passing a string known at compile-time to a run-time C function:
Chapter 6 of the Toy tutorial (Chapter 6: Lowering to LLVM and CodeGeneration - MLIR and llvm-project/LowerToLLVM.cpp at main Ā· llvm/llvm-project Ā· GitHub) shows how to rewrite a custom operation (toy::PrintOp
) to a call to the printf
C function during the lowering to the LLVM dialect. The trick is to pass the string as a !llvm.ptr<i8>
, which corresponds to a char*
in C.
Approach A) In the IR, you can get a mlir::Value
holding an mlir::LLVM::LLVMPointer
to your string using mlir::LLVM::createGlobalString()
. This creates a global storing your string, gets the address of it, and calculates a pointer to the first element of it. The source code of the Toy tutorial seems to do something similar to this function (see getOrCreateGlobalString
in the file mentioned above), but implements it itself. Note that you need to specify a name for the global string.
Approach B) Another option, which does not require the specification of a name, could be to create a buffer using the mlir::LLVM::AllocaOp
with type !llvm.ptr<i8>
and the size of your string. The result of this operation is the pointer you can pass to your function call. To store your string in this buffer, you could copy over the characters of your string one-by-one using mlir::LLVM::GEPOp
and mlir::LLVM::StoreOp
(make sure to append a \0
at the end if required).
If you chose to attach your string as an attribute to your operation, you need to lower it according to either approach A or B, plus to a call to your C function.
If you chose to create that StringConstantOp
, you need to lower it using either approach A or B, and the C function call is separate from it. Finally, in your lowering pass to the LLVM dialect, you need to add a type conversion of your String
type to !llvm.ptr<i8>
.
Hope that helps.
As stated above, Iād be happy about feedback on this solution. I personally chose approach B in combination with the StringConstantOp
. With approach A, I wasnāt sure how to choose the names of the globals. There could be many of such globals in my case, so I would have just used a counter, but wasnāt sure if it must be thread-safe etc. (Remember, my main goal was to make it work at allā¦)
Great! I have the same problem. I am trying to def a new type on toy dialect ,which can emit string literal to mlir.
Once I finish my job, i will share my code to you.
Thank you so much @pdamme , these are super helpful information. I will try your suggestions and share the experiences~
FYI, the LLVM dialect has string globals - 'llvm' Dialect - MLIR.
Is there a standard way to create a unique symbol name for these globals?
SymbolTable::insert
will autorename on collision.
How does insertion of operations into the symbol table work?
When I create two GlobalOp
s with the same name (e.g. using createGlobalString
), then I get: error: redefinition of symbol named '...'
.
Why are they not autorenamed?
Because you need to create a SymbolTable
instance and call .insert
on it after creating the op, like here https://github.com/llvm/llvm-project/blob/9d4896f50e441ea5b9e8ae78ebe328e006cb6b67/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L46-L52. Symbol handling is orthogonal from op creation, there may be cases that actually want duplicate symbols to exist, e.g., to merge them in a later pass. See also Symbols and Symbol Tables - MLIR.
When creating my SymbolTable(moduleOp)
instance, I run into an assertion: expected region to contain uniquely named symbol operations.
My pass which uses the SymbolTable to create a global string is run within an OpConversionPattern
inside of an OperationPass<ModuleOp>
. The symbols which are reported to be duplicate belong to FunctionOp
, which indeed seem to get duplicated during this pass.
Could this be an artifact of other lowerings happening at the same time (LoopToStd, MemRefToLLVM, StdToLLVM)? Is creating a SymbolTable
not safe within an OpConversionPattern
? But your example seems to do it in the same way.
Any ideas?
SymbolTable
does not know about OpConversionPattern
and vice versa. The situation is a bit complex here, but your diagnosis looks right. Inside conversion patterns, and when used with the dialect conversion infrastructure (i.e. applyPartial/FullConversion
), replacing an operation with another one does not delete the old operation immediately. Instead, the new operation is inserted next to the original one. This is necessary for several reasons, in particular for the conversion to be reversible and for type conversion purposes. Depending on the entire conversion being successful or not, either the original or the replacement operation will be actually erased at the end of the conversion process. As a result, functions with the same name may co-exist in the module when you construct a symbol table inside a pattern in case another pattern has previously āreplacedā functions, which is what likely happens in your case.
I donāt immediately have a good suggestion for you on how to proceed. One possibility is to create a SybmolTable instance before running the conversion, pass a reference to the table to all relevant patterns, and use it to update the table in all of them. This will lead to newly created functions having different names than original functions and youāll need some cleanup to rename them back after the conversion completes.
Another possibility is to split the conversion into two separate calls to the infrastructure: one that converts functions and another that produces symbols. The IR can be temporarily invalid within one pass, the difficulty here is correctly setting up the operation legality in the conversion targets for both calls.
Hope this helps.
Thanks, that helps a lot!
I created a SymbolTable
within the LoweringPass.runOnOperation()
and it seems to work.
(https://github.com/tali/sclang/commit/32b6b23).
How are the patterns applied within this pass?
The matchAndRewrite
function is const
. This way it is guaranteed that it can be called concurrently from separate threads. Is MLIR doing this? Does it work when the pattern contains a reference to the SymbolTable, which may then be changed concurrently through several patterns?
Roughly, blocks are ordered topologically and their operations are traversed in textual order, operation regions are visited recursively. There is no parallelization at the pattern level AFAIK and I donāt think matchAndRewrite
is const
for parallelism purposes.
MLIR does run function passes in parallel though and one must not modify anything above individual function (e.g., the parent module) in such passes.