com.hazelcast.jet (hazelcast-jet-root 0.5 API)

Interface Summary
Interface	Description
ComputeStage<E>	Represents a stage in a distributed computation `pipeline`.
JetInstance	Represents either an instance of a Jet server node or a Jet client instance that connects to a remote cluster.
Job	A Jet computation job created from a `DAG`.
Pipeline	Models a distributed computation job using an analogy with a system of interconnected water pipes.
Sink<E>	A transform which accepts an input stream and produces no output streams.
SinkStage	A pipeline stage that doesn't allow any downstream stages to be attached to it.
Source<E>	A transform that takes no input streams and produces an output stream.
Stage	The basic element of a Jet `pipeline`.
Transform	Represents the data transformation performed by a pipeline `Stage`.
Traverser<T>	Traverses a potentially infinite sequence of non-`null` items.

Class Summary
Class	Description
CoGroupBuilder<K,E0>	Offers a step-by-step fluent API to build a co-grouping pipeline stage by adding any number of contributing stages.
HashJoinBuilder<E0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
HdfsSinks	Factories of Apache Hadoop HDFS sinks.
HdfsSources	Contains factory methods for Apache Hadoop HDFS sources.
Jet	Entry point to the Jet product.
JoinClause<K,E0,E1,E1_OUT>	Specifies how to join an enriching stream to the primary stream in a `hash-join` operation.
KafkaSinks	Contains factory methods for Apache Kafka sinks.
KafkaSources	Contains factory methods for Apache Kafka sources.
Sinks	Contains factory methods for various types of pipeline sinks.
Sources	Contains factory methods for various types of pipeline sources.
Traversers	Utility class with several `Traverser`s useful in `Processor` implementations.
Util	Miscellaneous utility methods useful in DAG building logic.

Exception Summary
Exception Description

JetException
Base Jet exception.

Exception Summary
Exception	Description
JetException	Base Jet exception.

Package com.hazelcast.jet Description

The Pipeline API is Jet's high-level API to build and execute distributed computation jobs. It models the computation using an analogy with a system of interconnected water pipes. The data flows from the pipeline's sources to its sinks. Pipes can bifurcate and merge, but there can't be any closed loops (cycles).

The basic element is a pipeline stage which can be attached to one or more other stages, both in the upstream and the downstream direction. A stage accepts the data coming from its upstream stages, transforms it, and directs the resulting data to its downstream stages.

Kinds of transformation performed by pipeline stages

Basic

Basic transformations have a single upstream stage and statelessly transform individual items in it. Examples are map, filter, and flatMap.

Grouping and aggregation

The groupBy transformation groups items by key and performs an aggregate operation on each group. It outputs the results of the aggregate operation, one for each observed distinct key.

The coGroup transformation groups items by key in several streams at once and performs an aggregate operation on all groups that share the same key, separately for each key. It outputs the results of the aggregate operation, one for each observed distinct key.

Hash-join

Hash-join is a special kind of joining transform, specifically tailored to the use case of data enrichment. It is an asymmetrical join that distinguishes the primary upstream stage from the enriching stages. The source for an enriching stage is most typically a key-value store (such as a Hazelcast IMap). Its data stream must be finite and each item must have a distinct join key. The primary stage, on the other hand, may be infinite and contain duplicate keys.

For each of the enriching stages there is a separate pair of functions to extract the joining key on both sides. For example, a Trade can be joined with both a Broker on trade.getBrokerId() == broker.getId() and a Product on trade.getProductId() == product.getId(), and all this can happen in a single hash-join transform.

Implementationally, the hash-join transform is optimized for throughput so that each computing member has a local copy of all the enriching data, stored in hashtables (hence the name). The enriching streams are consumed in full before ingesting any data from the primary stream.