Map
Mapping is one of the most common operations and maps an incoming item to one output element.
For example, a vertex which converts incoming String
elements to
Integer
values can simply be written as:
Vertex mapper = dag.newVertex("toInteger", Processors.map((String s) -> Integer.valueOf(s)));
Filter
Filtering is another common operation, which filters incoming items
based on a predicate. Only items for which the predicate evaluates
to true
are allowed.
An vertex which filters out odd numbers could be written as follows:
Vertex filter = dag.newVertex("filterOddNumbers", Processors.filter((Integer s) -> s % 2 == 0));
FlatMap
FlatMap is a generalization of Map and maps each incoming
element to zero or more elements. The flatMap processor expects a
Traverser
for each incoming item. The processor then emits all mapped values
for an item before moving on to the next.
A vertex which will split incoming lines into words can be expressed as follows:
Vertex splitWords = dag.newVertex("splitWords",
Processors.flatMap((String line) -> Traversers.traverseArray(line.split("\\W+"))));
Aggregation
The focal point of distributed computation is solving the problem of grouping by a time window and/or grouping key and aggregating the data of each group. As we explained in the Hazelcast Jet 101 section, aggregation can take place in a single stage or in two stages, and there are separate variants for batch and stream jobs.
The complete matrix of factories for aggregator vertices is presented in the following table:
single-stage | stage 1/2 | stage 2/2 | |
---|---|---|---|
batch, no grouping |
aggregate() | accumulate() | combine() |
batch, group by key | aggregateByKey() | accumulateByKey() | combineByKey() |
stream, group by key and aligned window |
aggregateToSlidingWindow() | accumulateByFrame() | combineToSlidingWindow() |
stream, group by key and session window |
aggregateToSessionWindow() | N/A | N/A |