com.hazelcast.jet.pipeline (Hazelcast Root 5.3.8 API)

Interface Summary
Interface	Description
BatchSource<T>	A finite source of data for a Jet pipeline.
BatchStage<T>	A stage in a distributed computation `pipeline` that will observe a finite amount of data (a batch).
BatchStageWithKey<T,K>	An intermediate step while constructing a group-and-aggregate batch pipeline stage.
GeneralStage<T>	The common aspect of `batch` and `stream` pipeline stages, defining those operations that apply to both.
GeneralStageWithKey<T,K>	An intermediate step when constructing a group-and-aggregate pipeline stage.
Pipeline	Models a distributed computation job using an analogy with a system of interconnected water pipes.
Sink<T>	A data sink in a Jet pipeline.
SinkStage	A pipeline stage that doesn't allow any downstream stages to be attached to it.
SourceBuilder.SourceBuffer<T>	The buffer object that the `fillBufferFn` gets on each call.
SourceBuilder.TimestampedSourceBuffer<T>	The buffer object that the `fillBufferFn` gets on each call.
Stage	The basic element of a Jet `pipeline`, represents a computation step.
StageWithKeyAndWindow<T,K>	Represents an intermediate step in the construction of a pipeline stage that performs a windowed group-and-aggregate operation.
StageWithWindow<T>	Represents an intermediate step in the construction of a pipeline stage that performs a windowed aggregate operation.
StreamSource<T>	An infinite source of data for a Jet pipeline.
StreamSourceStage<T>	A source stage in a distributed computation `pipeline` that will observe an unbounded amount of data (i.e., an event stream).
StreamStage<T>	A stage in a distributed computation `pipeline` that will observe an unbounded amount of data (i.e., an event stream).
StreamStageWithKey<T,K>	An intermediate step while constructing a pipeline transform that involves a grouping key, such as windowed group-and-aggregate.

Class Summary
Class	Description
AggregateBuilder<R0>	Offers a step-by-step API to build a pipeline stage that co-aggregates the data from several input stages.
AggregateBuilder1<T0>	Offers a step-by-step API to build a pipeline stage that co-aggregates the data from several input stages.
DataConnectionRef	Represents a reference to the data connection, used with `Sources.jdbc(DataConnectionRef, ToResultSetFunction, FunctionEx)`.
FileSinkBuilder<T>	See `Sinks.filesBuilder(java.lang.String)`.
FileSourceBuilder	Builder for a file source which reads lines from files in a directory (but not its subdirectories) and emits output object created by `mapOutputFn`
GeneralHashJoinBuilder<T0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
GroupAggregateBuilder<K,R0>	Offers a step-by-step API to build a pipeline stage that co-groups and aggregates the data from several input stages.
GroupAggregateBuilder1<T0,K>	Offers a step-by-step API to build a pipeline stage that co-groups and aggregates the data from several input stages.
HashJoinBuilder<T0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
JdbcSinkBuilder<T>	See `Sinks.jdbcBuilder()`.
JmsSinkBuilder<T>	See `Sinks.jmsQueueBuilder(com.hazelcast.function.SupplierEx<javax.jms.ConnectionFactory>)` or `Sinks.jmsTopicBuilder(com.hazelcast.function.SupplierEx<javax.jms.ConnectionFactory>)`.
JmsSourceBuilder	See `Sources.jmsQueueBuilder(com.hazelcast.function.SupplierEx<? extends javax.jms.ConnectionFactory>)` or `Sources.jmsTopicBuilder(com.hazelcast.function.SupplierEx<? extends javax.jms.ConnectionFactory>)`.
JoinClause<K,T0,T1,T1_OUT>	Specifies how to join an enriching stream to the primary stream in a `hash-join` operation.
ServiceFactories	Utility class with methods that create several useful `service factories`.
ServiceFactory<C,S>	A holder of functions needed to create and destroy a service object used in pipeline transforms such as `stage.mapUsingService()`.
SessionWindowDefinition	Represents the definition of a session window.
SinkBuilder<C,T>	See `SinkBuilder.sinkBuilder(String, FunctionEx)`.
Sinks	Contains factory methods for various types of pipeline sinks.
SlidingWindowDefinition	Represents the definition of a sliding window.
SourceBuilder<C>	Top-level class for Jet custom source builders.
Sources	Contains factory methods for various types of pipeline sources.
StreamHashJoinBuilder<T0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
WindowAggregateBuilder<R0>	Offers a step-by-step fluent API to build a pipeline stage that performs a windowed co-aggregation of the data from several input stages.
WindowAggregateBuilder1<T0>	Offers a step-by-step fluent API to build a pipeline stage that performs a windowed co-aggregation of the data from several input stages.
WindowDefinition	The definition of the window for a windowed aggregation operation.
WindowGroupAggregateBuilder<K,R0>	Offers a step-by-step API to build a pipeline stage that performs a windowed co-grouping and aggregation of the data from several input stages.
WindowGroupAggregateBuilder1<T0,K>	Offers a step-by-step API to build a pipeline stage that performs a windowed co-grouping and aggregation of the data from several input stages.

Enum Summary
Enum Description

JournalInitialPosition
When passed to an IMap/ICache Event Journal source, specifies which event to start from.

Enum Summary
Enum	Description
JournalInitialPosition	When passed to an IMap/ICache Event Journal source, specifies which event to start from.

Package com.hazelcast.jet.pipeline Description

The Pipeline API is Jet's high-level API to build and execute distributed computation jobs. It models the computation using an analogy with a system of interconnected water pipes. The data flows from the pipeline's sources to its sinks. Pipes can bifurcate and merge, but there can't be any closed loops (cycles).

The basic element is a pipeline stage which can be attached to one or more other stages, both in the upstream and the downstream direction. A pipeline accepts the data coming from its upstream stages, transforms it, and directs the resulting data to its downstream stages.

Kinds of transformation performed by pipeline stages

Basic

Basic transformations have a single upstream pipeline and statelessly transform individual items in it. Examples are map, filter, and flatMap.

Grouping and aggregation

The aggregate*() transformations perform an aggregate operation on a set of items. You can call stage.groupingKey() to group the items by a key and then Jet will aggregate each group separately. For stream stages you must specify a stage.window() which will transform the infinite stream into a series of finite windows. If you specify more than one input stage for the aggregation (using stage.aggregate2(), stage.aggregate3() or stage.aggregateBuilder(), the data from all streams will be combined into the aggregation result. The AggregateOperation you supply must define a separate accumulate primitive for each contributing stream. Refer to its Javadoc for further details.

Hash-Join

The hash-join is a joining transform designed for the use case of data enrichment with static data. It is an asymmetric join that joins the enriching stage(s) to the primary stage. The enriching stages must be batch stages — they must represent finite datasets. The primary stage may be either a batch or a stream stage.

You must provide a separate pair of functions for each of the enriching stages: one to extract the key from the primary item and one to extract it from the enriching item. For example, you can join a Trade with a Broker on trade.getBrokerId() == broker.getId() and a Product on trade.getProductId() == product.getId(), and all this can happen in a single hash-join transform.

The hash-join transform is optimized for throughput — each cluster member materializes a local copy of all the enriching data, stored in hashtables (hence the name). It consumes the enriching streams in full before ingesting any data from the primary stream.

The output of hashJoin is just like an SQL left outer join: for each primary item there are N output items, one for each matching item in the enriching set. If an enriching set doesn't have a matching item, the output will have a null instead of the enriching item.

If you need SQL inner join, then you can use the specialised innerHashJoin function, in which for each primary item with at least one match, there are N output items, one for each matching item in the enriching set. If an enriching set doesn't have a matching item, there will be no records with the given primary item. In this case the output function's arguments are always non-null.

The join also allows duplicate keys on both enriching and primary inputs: the output is a cartesian product of all the matching entries.

Example:

 +------------------------+-----------------+---------------------------+
 |     Primary input      | Enriching input |          Output           |
 +------------------------+-----------------+---------------------------+
 | Trade{ticker=AA,amt=1} | Ticker{id=AA}   | Tuple2{                   |
 | Trade{ticker=BB,amt=2} | Ticker{id=BB}   |   Trade{ticker=AA,amt=1}, |
 | Trade{ticker=AA,amt=3} |                 |   Ticker{id=AA}           |
 |                        |                 | }                         |
 |                        |                 | Tuple2{                   |
 |                        |                 |   Trade{ticker=BB,amt=2}, |
 |                        |                 |   Ticker{id=BB}           |
 |                        |                 | }                         |
 |                        |                 | Tuple2{                   |
 |                        |                 |   Trade{ticker=AA,amt=3}, |
 |                        |                 |   Ticker{id=AA}           |
 |                        |                 | }                         |
 +------------------------+-----------------+---------------------------+

Since:: Jet 3.0