Introduction
Streaming libraries and abstractions become more popular as languages and tools adopt such concepts. Furthermore, some high-level synthesis tools provide library support for streaming code, which then can be lowered to hardware.
In the spirit of MLIR and CIRCT, we anticipate that providing higher-level
abstraction in the form of a dialect simplifies further implementation efforts by providing a uniform interface.
Over the last few months I worked on a stream
dialect that has the goal to be lowered to hardware. @mortbopet told me that there was some interest in the CIRCT community (especially @stephenneuendorffer) add such abstractions for HLS flows.
Iām trying to provide a brief description of my current approach to collect some feedback. If there is indeed interest for such an abstraction, I would gladly work towards upstreaming this dialect.
My current implementation is here: GitHub - Dinistro/circt-stream: A stream to RTL compiler based on MLIR and CIRCT
Types
The stream
dialect introduces a single type that defines a stream by its element types. An element can either be an integer or a tuple of element types.
Examples:
!stream.stream<i64>
!stream.stream<tuple<i32, tuple<i8, i64>>>
I did not yet play around with floats, but supporting that will mainly depend on the lower levels.
Operations
There are two different kinds of operations:
- A set of operations that work directly with streams. These operations all consume and produce a variable amount of streams.
- Auxiliary operations that help to work with elements of the stream, e.g., packing or unpacking tuples, yielding elements, etc.
So far, the stream
dialect supports the following set of stream operations: map
, filter
, reduce
, and create
.
The first three expect regions that define the computation to be performed on each stream element. Note that the region arguments differ depending on the operation and the element types of the streams passed in.
Example:
%res = stream.map(%in) : (!stream.stream<i32>) -> !stream.stream<i32> {
^0(%val : i32):
%0 = arith.constant 1 : i32
%r = arith.addi %0, %val : i32
stream.yield %r : i32
}
}
The create
produces a stream from a fixed set of values and is thus mainly used for integration testing.
Lowering
One natural target for the streaming abstraction to lower is the handshake dialect.
The handshake dialect is somewhat stable, and the StandardToHandshake
pass can be reused to lower the regions of the operations.
The streaming abstraction can be lowered to a task pipelined handshake representation.
Each stream becomes an handshaked value of the element type, and all the operations defined on this stream are applied directly to these values.
Note that certain operations like filter
and reduce
might not produce an output for each incoming tuple, and thus, they terminate some of the tasks.
End-of-Stream signal
Some operations, e.g., reduce
, only produce a result when the incoming stream terminates.
To allow such behavior upon lowering each stream provides an EOS
signal which is asserted once
the stream is ends. The lowering packs the element value and the EOS
signal into one tuple to ensure only one handshake mechanism is emitted for each stream.
Open Questions
Memory
For some tasks it is necessary to have memory that an operation can use. Modeling this in the stream
dialect is not too difficult but lowering it to handshake is non-trivial. This memory will require an initialization phase, e.g., on reset
, but handshake has no notion of such a thing.
Iām looking forward for some feedback and questions. The code base contains many more examples for the different operators described here.