Does MLIR support string (or char*) type?

rqtian · July 20, 2021, 2:10pm

In MLIR, we can define an int variable like this:

%a = constant 10 : i32

Does it support a string variable (or char*) defined in a similar way? For example:

%s = constant "I am a string" : string

gcmn · July 20, 2021, 3:49pm

The standard-dialect constant operation does not support strings and there is no builtin string type (see Builtin Dialect - MLIR), but you can define a string type in your dialect and create an operation that would construct a value of that type. For instance, the TensorFlow dialect has a string type: tensorflow/tf_types.def at 104959a3051b4df05fb380588e9ef517b1a422e2 · tensorflow/tensorflow · GitHub (though it’s got a few extra layers of macros and such in the definition). https://mlir.llvm.org/docs/Tutorials/DefiningAttributesAndTypes/ explains how to create your own type.

N.B.: The OpaqueType in the builtin dialect is represented as a string literal, but I don’t think that’s what you want.

rqtian · July 20, 2021, 4:08pm

Thank you for your info @gcmn , I appreciate the information.

More question: I want to call a runtime function in mlir call @getString(){stringvalue="aaaa"} : () -> (). The runtime function is implemented in C void getString() {...}. Is there any method that I can pass the string from mlir to C, i.e. is there a method that I can get the stringvalue attribute in the C implementation?

Thank you in advance!

pdamme · July 22, 2021, 9:21pm

Hi @rqtian, I recently faced the same problem: How to represent values of type “string” in MLIR and how to pass them to pre-compiled C functions at run-time. Note that I’m still a learner in MLIR and LLVM, so the solution might not be perfect (and I’d indeed appreciate feedback from other users). In my case, making it work at all was the main goal, not making it work efficiently. Here is a sketch of how I did it:

String type

As @gcmn suggests, I defined a string type in my own dialect, as simple as:


def String : MyDialect_Type<"String"> {

let summary = "string";

}

How to carry strings through the IR

One option is to attach a StringAttr to your operation, as in your comment above (but for that, you don‘t even need a string type). Another option is to create a kind of StringConstantOp in your dialect, which is mostly similar to the existing ConstantOp, but has a StringAttr and is of result type String (the one defined above).

Passing a string known at compile-time to a run-time C function:

Chapter 6 of the Toy tutorial (Chapter 6: Lowering to LLVM and CodeGeneration - MLIR and llvm-project/LowerToLLVM.cpp at main · llvm/llvm-project · GitHub) shows how to rewrite a custom operation (toy::PrintOp) to a call to the printf C function during the lowering to the LLVM dialect. The trick is to pass the string as a !llvm.ptr<i8>, which corresponds to a char* in C.

Approach A) In the IR, you can get a mlir::Value holding an mlir::LLVM::LLVMPointer to your string using mlir::LLVM::createGlobalString(). This creates a global storing your string, gets the address of it, and calculates a pointer to the first element of it. The source code of the Toy tutorial seems to do something similar to this function (see getOrCreateGlobalString in the file mentioned above), but implements it itself. Note that you need to specify a name for the global string.

Approach B) Another option, which does not require the specification of a name, could be to create a buffer using the mlir::LLVM::AllocaOp with type !llvm.ptr<i8> and the size of your string. The result of this operation is the pointer you can pass to your function call. To store your string in this buffer, you could copy over the characters of your string one-by-one using mlir::LLVM::GEPOp and mlir::LLVM::StoreOp (make sure to append a \0 at the end if required).

If you chose to attach your string as an attribute to your operation, you need to lower it according to either approach A or B, plus to a call to your C function.

If you chose to create that StringConstantOp, you need to lower it using either approach A or B, and the C function call is separate from it. Finally, in your lowering pass to the LLVM dialect, you need to add a type conversion of your String type to !llvm.ptr<i8>.

Hope that helps.

As stated above, I’d be happy about feedback on this solution. I personally chose approach B in combination with the StringConstantOp. With approach A, I wasn’t sure how to choose the names of the globals. There could be many of such globals in my case, so I would have just used a counter, but wasn’t sure if it must be thread-safe etc. (Remember, my main goal was to make it work at all…)

MATRIXKOO · July 23, 2021, 10:56am

Great! I have the same problem. I am trying to def a new type on toy dialect ,which can emit string literal to mlir.
Once I finish my job, i will share my code to you.

rqtian · July 27, 2021, 3:16pm

Thank you so much @pdamme , these are super helpful information. I will try your suggestions and share the experiences~

ftynse · July 28, 2021, 10:45am

FYI, the LLVM dialect has string globals - 'llvm' Dialect - MLIR.

tali · September 8, 2021, 7:14am

Is there a standard way to create a unique symbol name for these globals?

ftynse · September 8, 2021, 8:20am

SymbolTable::insert will autorename on collision.

tali · September 9, 2021, 8:06am

How does insertion of operations into the symbol table work?
When I create two GlobalOps with the same name (e.g. using createGlobalString), then I get: error: redefinition of symbol named '...'.
Why are they not autorenamed?

ftynse · September 9, 2021, 8:24am

Because you need to create a SymbolTable instance and call .insert on it after creating the op, like here https://github.com/llvm/llvm-project/blob/9d4896f50e441ea5b9e8ae78ebe328e006cb6b67/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L46-L52. Symbol handling is orthogonal from op creation, there may be cases that actually want duplicate symbols to exist, e.g., to merge them in a later pass. See also Symbols and Symbol Tables - MLIR.

tali · September 13, 2021, 7:45pm

When creating my SymbolTable(moduleOp) instance, I run into an assertion: expected region to contain uniquely named symbol operations.
My pass which uses the SymbolTable to create a global string is run within an OpConversionPattern inside of an OperationPass<ModuleOp>. The symbols which are reported to be duplicate belong to FunctionOp, which indeed seem to get duplicated during this pass.
Could this be an artifact of other lowerings happening at the same time (LoopToStd, MemRefToLLVM, StdToLLVM)? Is creating a SymbolTable not safe within an OpConversionPattern? But your example seems to do it in the same way.
Any ideas?

ftynse · September 13, 2021, 8:42pm

SymbolTable does not know about OpConversionPattern and vice versa. The situation is a bit complex here, but your diagnosis looks right. Inside conversion patterns, and when used with the dialect conversion infrastructure (i.e. applyPartial/FullConversion), replacing an operation with another one does not delete the old operation immediately. Instead, the new operation is inserted next to the original one. This is necessary for several reasons, in particular for the conversion to be reversible and for type conversion purposes. Depending on the entire conversion being successful or not, either the original or the replacement operation will be actually erased at the end of the conversion process. As a result, functions with the same name may co-exist in the module when you construct a symbol table inside a pattern in case another pattern has previously “replaced” functions, which is what likely happens in your case.

I don’t immediately have a good suggestion for you on how to proceed. One possibility is to create a SybmolTable instance before running the conversion, pass a reference to the table to all relevant patterns, and use it to update the table in all of them. This will lead to newly created functions having different names than original functions and you’ll need some cleanup to rename them back after the conversion completes.
Another possibility is to split the conversion into two separate calls to the infrastructure: one that converts functions and another that produces symbols. The IR can be temporarily invalid within one pass, the difficulty here is correctly setting up the operation legality in the conversion targets for both calls.

Hope this helps.

tali · September 13, 2021, 9:07pm

Thanks, that helps a lot!

tali · September 14, 2021, 9:38am

I created a SymbolTable within the LoweringPass.runOnOperation() and it seems to work.
(https://github.com/tali/sclang/commit/32b6b23).

How are the patterns applied within this pass?
The matchAndRewrite function is const. This way it is guaranteed that it can be called concurrently from separate threads. Is MLIR doing this? Does it work when the pattern contains a reference to the SymbolTable, which may then be changed concurrently through several patterns?

ftynse · September 14, 2021, 9:58am

Roughly, blocks are ordered topologically and their operations are traversed in textual order, operation regions are visited recursively. There is no parallelization at the pattern level AFAIK and I don’t think matchAndRewrite is const for parallelism purposes.

MLIR does run function passes in parallel though and one must not modify anything above individual function (e.g., the parent module) in such passes.

Topic		Replies	Views
Array globals in LLVM dialect MLIR	8	630	January 30, 2024
Custom mlir tensor type MLIR	0	236	August 23, 2022
LLVM Dialect GlobalOp String Beginners	1	194	July 17, 2023
RFC: Global Variables in MLIR MLIR	18	2296	October 26, 2020
AnyAttr and AnyAttrOf in a type ("assertion _M_is_engaged failed") MLIR	14	283	August 28, 2023

Does MLIR support string (or char*) type?

Related Topics