T - the type of items coming out of this stagepublic interface GeneralStage<T> extends Stage
batch and stream pipeline stages, defining those operations that apply to both.
 Unless specified otherwise, all functions passed to methods of this interface must be stateless and cooperative.
| Modifier and Type | Field and Description | 
|---|---|
| static int | DEFAULT_MAX_CONCURRENT_OPSDefault value for max concurrent operations. | 
| static boolean | DEFAULT_PRESERVE_ORDERDefault value for preserver order. | 
| Modifier and Type | Method and Description | 
|---|---|
| StreamStage<T> | addTimestamps(ToLongFunctionEx<? super T> timestampFn,
             long allowedLag)Adds a timestamp to each item in the stream using the supplied function
 and specifies the allowed amount of disorder between them. | 
| <R> GeneralStage<R> | customTransform(String stageName,
               ProcessorMetaSupplier procSupplier)Attaches a stage with a custom transform based on the provided supplier
 of Core API  Processors. | 
| <R> GeneralStage<R> | customTransform(String stageName,
               ProcessorSupplier procSupplier)Attaches a stage with a custom transform based on the provided supplier
 of Core API  Processors. | 
| <R> GeneralStage<R> | customTransform(String stageName,
               SupplierEx<Processor> procSupplier)Attaches a stage with a custom transform based on the provided supplier
 of Core API  Processors. | 
| GeneralStage<T> | filter(PredicateEx<T> filterFn)Attaches a filtering stage which applies the provided predicate function
 to each input item to decide whether to pass the item to the output or
 to discard it. | 
| <S> GeneralStage<T> | filterStateful(SupplierEx<? extends S> createFn,
              BiPredicateEx<? super S,? super T> filterFn)Attaches a stage that performs a stateful filtering operation. | 
| <S> GeneralStage<T> | filterUsingService(ServiceFactory<?,S> serviceFactory,
                  BiPredicateEx<? super S,? super T> filterFn)Attaches a filtering stage which applies the provided predicate function
 to each input item to decide whether to pass the item to the output or
 to discard it. | 
| <R> GeneralStage<R> | flatMap(FunctionEx<? super T,? extends Traverser<R>> flatMapFn)Attaches a flat-mapping stage which applies the supplied function to
 each input item independently and emits all the items from the  Traverserit returns. | 
| <S,R> GeneralStage<R> | flatMapStateful(SupplierEx<? extends S> createFn,
               BiFunctionEx<? super S,? super T,? extends Traverser<R>> flatMapFn)Attaches a stage that performs a stateful flat-mapping operation. | 
| <S,R> GeneralStage<R> | flatMapUsingService(ServiceFactory<?,S> serviceFactory,
                   BiFunctionEx<? super S,? super T,? extends Traverser<R>> flatMapFn)Attaches a flat-mapping stage which applies the supplied function to
 each input item independently and emits all items from the  Traverserit returns as the output items. | 
| <K> GeneralStageWithKey<T,K> | groupingKey(FunctionEx<? super T,? extends K> keyFn)Specifies the function that will extract a key from the items in the
 associated pipeline stage. | 
| <K,T1_IN,T1,R> | hashJoin(BatchStage<T1_IN> stage1,
        JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1,
        BiFunctionEx<T,T1,R> mapToOutputFn)Attaches to both this and the supplied stage a hash-joining stage and
 returns it. | 
| <K1,K2,T1_IN,T2_IN,T1,T2,R> | hashJoin2(BatchStage<T1_IN> stage1,
         JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1,
         BatchStage<T2_IN> stage2,
         JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2,
         TriFunction<T,T1,T2,R> mapToOutputFn)Attaches to this and the two supplied stages a hash-joining stage and
 returns it. | 
| GeneralHashJoinBuilder<T> | hashJoinBuilder()Returns a fluent API builder object to construct a hash join operation
 with any number of contributing stages. | 
| <K,T1_IN,T1,R> | innerHashJoin(BatchStage<T1_IN> stage1,
             JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1,
             BiFunctionEx<T,T1,R> mapToOutputFn)Attaches to both this and the supplied stage an inner hash-joining stage
 and returns it. | 
| <K1,K2,T1_IN,T2_IN,T1,T2,R> | innerHashJoin2(BatchStage<T1_IN> stage1,
              JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1,
              BatchStage<T2_IN> stage2,
              JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2,
              TriFunction<T,T1,T2,R> mapToOutputFn)Attaches to this and the two supplied stages an inner hash-joining stage
 and returns it. | 
| <R> GeneralStage<R> | map(FunctionEx<? super T,? extends R> mapFn)Attaches a mapping stage which applies the given function to each input
 item independently and emits the function's result as the output item. | 
| <S,R> GeneralStage<R> | mapStateful(SupplierEx<? extends S> createFn,
           BiFunctionEx<? super S,? super T,? extends R> mapFn)Attaches a stage that performs a stateful mapping operation. | 
| default <K,V,R> GeneralStage<R> | mapUsingIMap(IMap<K,V> iMap,
            FunctionEx<? super T,? extends K> lookupKeyFn,
            BiFunctionEx<? super T,? super V,? extends R> mapFn)Attaches a mapping stage where for each item a lookup in the supplied
  IMapis performed and the result of the lookup is merged with
 the item and emitted. | 
| default <K,V,R> GeneralStage<R> | mapUsingIMap(String mapName,
            FunctionEx<? super T,? extends K> lookupKeyFn,
            BiFunctionEx<? super T,? super V,? extends R> mapFn)Attaches a mapping stage where for each item a lookup in the  IMapwith the supplied name is performed and the result of the lookup is
 merged with the item and emitted. | 
| default <K,V,R> GeneralStage<R> | mapUsingReplicatedMap(ReplicatedMap<K,V> replicatedMap,
                     FunctionEx<? super T,? extends K> lookupKeyFn,
                     BiFunctionEx<? super T,? super V,? extends R> mapFn)Attaches a mapping stage where for each item a lookup in the supplied
  ReplicatedMapis performed and the result of the lookup is
 merged with the item and emitted. | 
| default <K,V,R> GeneralStage<R> | mapUsingReplicatedMap(String mapName,
                     FunctionEx<? super T,? extends K> lookupKeyFn,
                     BiFunctionEx<? super T,? super V,? extends R> mapFn)Attaches a mapping stage where for each item a lookup in the  ReplicatedMapwith the supplied name is performed and the result of the
 lookup is merged with the item and emitted. | 
| <S,R> GeneralStage<R> | mapUsingService(ServiceFactory<?,S> serviceFactory,
               BiFunctionEx<? super S,? super T,? extends R> mapFn)Attaches a mapping stage which applies the supplied function to each
 input item independently and emits the function's result as the output
 item. | 
| default <S,R> GeneralStage<R> | mapUsingServiceAsync(ServiceFactory<?,S> serviceFactory,
                    BiFunctionEx<? super S,? super T,? extends CompletableFuture<R>> mapAsyncFn)Asynchronous version of  mapUsingService(com.hazelcast.jet.pipeline.ServiceFactory<?, S>, com.hazelcast.function.BiFunctionEx<? super S, ? super T, ? extends R>): themapAsyncFnreturns aCompletableFuture<R>instead of justR. | 
| <S,R> GeneralStage<R> | mapUsingServiceAsync(ServiceFactory<?,S> serviceFactory,
                    int maxConcurrentOps,
                    boolean preserveOrder,
                    BiFunctionEx<? super S,? super T,? extends CompletableFuture<R>> mapAsyncFn)Asynchronous version of  mapUsingService(com.hazelcast.jet.pipeline.ServiceFactory<?, S>, com.hazelcast.function.BiFunctionEx<? super S, ? super T, ? extends R>): themapAsyncFnreturns aCompletableFuture<R>instead of justR. | 
| <S,R> GeneralStage<R> | mapUsingServiceAsyncBatched(ServiceFactory<?,S> serviceFactory,
                           int maxBatchSize,
                           BiFunctionEx<? super S,? super List<T>,? extends CompletableFuture<List<R>>> mapAsyncFn)Batched version of  mapUsingServiceAsync(com.hazelcast.jet.pipeline.ServiceFactory<?, S>, com.hazelcast.function.BiFunctionEx<? super S, ? super T, ? extends java.util.concurrent.CompletableFuture<R>>):mapAsyncFntakes
 a list of input items and returns aCompletableFuture<List<R>>. | 
| default GeneralStage<T> | peek()Adds a peeking layer to this compute stage which logs its output. | 
| default GeneralStage<T> | peek(FunctionEx<? super T,? extends CharSequence> toStringFn)Adds a peeking layer to this compute stage which logs its output. | 
| GeneralStage<T> | peek(PredicateEx<? super T> shouldLogFn,
    FunctionEx<? super T,? extends CharSequence> toStringFn)Attaches a peeking stage which logs this stage's output and passes it
 through without transformation. | 
| GeneralStage<T> | rebalance()Returns a new stage that applies data rebalancing to the output of this
 stage. | 
| <K> GeneralStage<T> | rebalance(FunctionEx<? super T,? extends K> keyFn)Returns a new stage that applies data rebalancing to the output of this
 stage. | 
| default <A,R> GeneralStage<R> | rollingAggregate(AggregateOperation1<? super T,A,? extends R> aggrOp)Attaches a rolling aggregation stage. | 
| GeneralStage<T> | setLocalParallelism(int localParallelism)Sets the preferred local parallelism (number of processors per Jet
 cluster member) this stage will configure its DAG vertices with. | 
| GeneralStage<T> | setName(String name)Overrides the default name of the stage with the name you choose and
 returns the stage. | 
| SinkStage | writeTo(Sink<? super T> sink)Attaches a sink stage, one that accepts data but doesn't emit any. | 
getPipeline, namestatic final int DEFAULT_MAX_CONCURRENT_OPS
static final boolean DEFAULT_PRESERVE_ORDER
@Nonnull <R> GeneralStage<R> map(@Nonnull FunctionEx<? super T,? extends R> mapFn)
null, it emits nothing. Therefore, this stage
 can be used to implement filtering semantics as well.
 This sample takes a stream of names and outputs the names in lowercase:
 stage.map(name -> name.toLowerCase(Locale.ROOT))
 R - the result type of the mapping functionmapFn - a mapping function. It must be stateless and cooperative.@Nonnull GeneralStage<T> filter(@Nonnull PredicateEx<T> filterFn)
This sample removes empty strings from the stream:
 stage.filter(name -> !name.isEmpty())
 filterFn - a filter predicate function. It must be stateless and
     cooperative.@Nonnull <R> GeneralStage<R> flatMap(@Nonnull FunctionEx<? super T,? extends Traverser<R>> flatMapFn)
Traverser it returns. The traverser must be null-terminated.
 This sample takes a stream of sentences and outputs a stream of individual words in them:
 stage.flatMap(sentence -> traverseArray(sentence.split("\\W+")))
 R - the type of items in the result's traversersflatMapFn - a flatmapping function, whose result type is
                  Jet's Traverser. It must not return a null traverser, but can
                  return an empty traverser. It must be
                  stateless and cooperative.@Nonnull <S,R> GeneralStage<R> mapStateful(@Nonnull SupplierEx<? extends S> createFn, @Nonnull BiFunctionEx<? super S,? super T,? extends R> mapFn)
createFn returns the object that holds the state. Jet passes this
 object along with each input item to mapFn, which can update
 the object's state. The state object will be included in the state
 snapshot, so it survives job restarts. For this reason it must be
 serializable.
 If you want to return the state variable from mapFn,
 then the return value must be a copy of state variable to avoid
 situations in which the result of mapFn is modified
 after being emitted or where the state is modified by downstream processors.
 
 If you want to return the state variable from mapFn, then the
 return value must be a copy of state variable to avoid situations in
 which the result of mapFn is modified after being emitted or
 where the state is modified by downstream processors.
 
 This sample takes a stream of long numbers representing request
 latencies, computes the cumulative latency of all requests so far, and
 starts emitting alarm messages when the cumulative latency crosses a
 "bad behavior" threshold:
 
 StreamStage<Long> latencyAlarms = latencies.mapStateful(
         LongAccumulator::new,
         (sum, latency) -> {
             sum.add(latency);
             long cumulativeLatency = sum.get();
             return (cumulativeLatency <= LATENCY_THRESHOLD)
                     ? null
                     : cumulativeLatency;
         }
 );
 latencies.rollingAggregate(summing()).S - type of the state objectR - type of the resultcreateFn - function that returns the state object. It must be
                 stateless and cooperative.mapFn - function that receives the state object and the input item and
                 outputs the result item. It may modify the state object. It must be
                 stateless and cooperative.@Nonnull <S> GeneralStage<T> filterStateful(@Nonnull SupplierEx<? extends S> createFn, @Nonnull BiPredicateEx<? super S,? super T> filterFn)
createFn returns the object that holds the state. Jet passes this
 object along with each input item to filterFn, which can update
 the object's state. The state object will be included in the state
 snapshot, so it survives job restarts. For this reason it must be
 serializable.
 This sample decimates the input (throws out every 10th item):
 GeneralStage<String> decimated = input.filterStateful(
         LongAccumulator::new,
         (counter, item) -> {
             counter.add(1);
             return counter.get() % 10 != 0;
         }
 );
 S - type of the state objectcreateFn - function that returns the state object. It must be
                 stateless and cooperative.filterFn - function that receives the state object and the input item and
                 produces the boolean result. It may modify the state object. It must be
                 stateless and cooperative.@Nonnull <S,R> GeneralStage<R> flatMapStateful(@Nonnull SupplierEx<? extends S> createFn, @Nonnull BiFunctionEx<? super S,? super T,? extends Traverser<R>> flatMapFn)
createFn returns the object that holds the state. Jet passes this
 object along with each input item to flatMapFn, which can update
 the object's state. The state object will be included in the state
 snapshot, so it survives job restarts. For this reason it must be
 serializable.
 If you want to return the state variable from mapFn,
 then the return value must be a copy of state variable to avoid
 situations in which the result of mapFn is modified
 after being emitted or where the state is modified by downstream processors.
 
 If you want to return the state variable from flatMapFn, then the
 return value must be a copy of state variable to avoid situations in
 which the result of mapFn is modified after being emitted or
 where the state is modified by downstream processors.
 
This sample inserts a punctuation mark (a special string) after every 10th input string:
 GeneralStage<String> punctuated = input.flatMapStateful(
         LongAccumulator::new,
         (counter, item) -> {
             counter.add(1);
             return counter.get() % 10 == 0
                     ? Traversers.traverseItems("punctuation", item)
                     : Traversers.singleton(item);
         }
 );
 S - type of the state objectR - type of the resultcreateFn - function that returns the state object. It must be
                  stateless and cooperative.flatMapFn - function that receives the state object and the input item and
                  outputs the result items. It may modify the state
                  object. It must not return null traverser, but can
                  return an empty traverser. It must be
                  stateless and cooperative.@Nonnull default <A,R> GeneralStage<R> rollingAggregate(@Nonnull AggregateOperation1<? super T,A,? extends R> aggrOp)
AggregateOperation. It passes each input item to
 the accumulator and outputs the current result of aggregation (as
 returned by the export primitive).
 Sample usage:
 stage.rollingAggregate(AggregateOperations.summing())
 {2, 7, 8, -5}, the output will be
 {2, 9, 17, 12}.
 This stage is fault-tolerant and saves its state to the snapshot.
 NOTE: since the output for each item depends on all
 the previous items, this operation cannot be parallelized. Jet will
 perform it on a single member, single-threaded. Jet also supports
 keyed rolling aggregation
 which it can parallelize by partitioning.
R - result type of the aggregate operationaggrOp - the aggregate operation to do the aggregation@Nonnull <S,R> GeneralStage<R> mapUsingService(@Nonnull ServiceFactory<?,S> serviceFactory, @Nonnull BiFunctionEx<? super S,? super T,? extends R> mapFn)
serviceFactory.
 
 If the mapping result is null, it emits nothing. Therefore, this
 stage can be used to implement filtering semantics as well.
 
 This sample takes a stream of stock items and sets the detail
 field on them by looking up from a registry:
 
 stage.mapUsingService(
     ServiceFactories.sharedService(ctx -> new ItemDetailRegistry(ctx.hazelcastInstance())),
     (reg, item) -> item.setDetail(reg.fetchDetail(item))
 )
 S - type of service objectR - the result type of the mapping functionserviceFactory - the service factorymapFn - a mapping function. It must be stateless. It must be
     cooperative, if the service
     is cooperative.@Nonnull default <S,R> GeneralStage<R> mapUsingServiceAsync(@Nonnull ServiceFactory<?,S> serviceFactory, @Nonnull BiFunctionEx<? super S,? super T,? extends CompletableFuture<R>> mapAsyncFn)
mapUsingService(com.hazelcast.jet.pipeline.ServiceFactory<?, S>, com.hazelcast.function.BiFunctionEx<? super S, ? super T, ? extends R>): the mapAsyncFn
 returns a CompletableFuture<R> instead of just R.
 Uses default values for some extra parameters, so the maximum number of concurrent async operations per processor will be limited to 4 and whether or not the order of input items should be preserved will be true.
The function can return a null future or the future can return a null result: in both cases it will act just like a filter.
The latency of the async call will add to the total latency of the output.
 This sample takes a stream of stock items and sets the detail
 field on them by looking up from a registry:
 
 stage.mapUsingServiceAsync(
     ServiceFactories.sharedService(ctx -> new ItemDetailRegistry(ctx.hazelcastInstance())),
     (reg, item) -> reg.fetchDetailAsync(item)
                       .thenApply(detail -> item.setDetail(detail))
 )
 S - type of service objectR - the future result type of the mapping functionserviceFactory - the service factorymapAsyncFn - a mapping function. Can map to null (return a null
     future). It must be stateless and cooperative.@Nonnull <S,R> GeneralStage<R> mapUsingServiceAsync(@Nonnull ServiceFactory<?,S> serviceFactory, int maxConcurrentOps, boolean preserveOrder, @Nonnull BiFunctionEx<? super S,? super T,? extends CompletableFuture<R>> mapAsyncFn)
mapUsingService(com.hazelcast.jet.pipeline.ServiceFactory<?, S>, com.hazelcast.function.BiFunctionEx<? super S, ? super T, ? extends R>): the mapAsyncFn
 returns a CompletableFuture<R> instead of just R.
 The function can return a null future or the future can return a null result: in both cases it will act just like a filter.
The latency of the async call will add to the total latency of the output.
 This sample takes a stream of stock items and sets the detail
 field on them by looking up from a registry:
 
 stage.mapUsingServiceAsync(
     ServiceFactories.sharedService(ctx -> new ItemDetailRegistry(ctx.hazelcastInstance())),
     (reg, item) -> reg.fetchDetailAsync(item)
                       .thenApply(detail -> item.setDetail(detail))
 )
 S - type of service objectR - the future result type of the mapping functionserviceFactory - the service factorymaxConcurrentOps - maximum number of concurrent async operations per processorpreserveOrder - whether the ordering of the input items should be preservedmapAsyncFn - a mapping function. Can map to null (return a null
     future). It must be stateless and cooperative.@Nonnull <S,R> GeneralStage<R> mapUsingServiceAsyncBatched(@Nonnull ServiceFactory<?,S> serviceFactory, int maxBatchSize, @Nonnull BiFunctionEx<? super S,? super List<T>,? extends CompletableFuture<List<R>>> mapAsyncFn)
mapUsingServiceAsync(com.hazelcast.jet.pipeline.ServiceFactory<?, S>, com.hazelcast.function.BiFunctionEx<? super S, ? super T, ? extends java.util.concurrent.CompletableFuture<R>>): mapAsyncFn takes
 a list of input items and returns a CompletableFuture<List<R>>.
 The size of the input list is limited by the given maxBatchSize.
 The number of in-flight batches being completed asynchronously is limited to and this mapping operation always preserves the order of input elements.
 This transform can perform filtering by putting null elements into
 the output list.
 
The latency of the async call will add to the total latency of the output.
 This sample takes a stream of stock items and sets the detail
 field on them by performing batched lookups from a registry. The max
 size of the items to lookup is specified as 100:
 
 stage.mapUsingServiceAsyncBatched(
     ServiceFactories.sharedService(ctx -> new ItemDetailRegistry(ctx.hazelcastInstance())),
     100,
     (reg, itemList) -> reg
             .fetchDetailsAsync(itemList)
             .thenApply(detailList -> {
                 for (int i = 0; i < itemList.size(); i++) {
                     itemList.get(i).setDetail(detailList.get(i))
                 }
             })
 )
 S - type of service objectR - the future result type of the mapping functionserviceFactory - the service factorymaxBatchSize - max size of the input listmapAsyncFn - a mapping function. It must be stateless and
     cooperative.@Nonnull <S> GeneralStage<T> filterUsingService(@Nonnull ServiceFactory<?,S> serviceFactory, @Nonnull BiPredicateEx<? super S,? super T> filterFn)
serviceFactory.
 This sample takes a stream of photos, uses an image classifier to reason about their contents, and keeps only photos of cats:
 photos.filterUsingService(
     ServiceFactories.sharedService(ctx -> new ImageClassifier(ctx.hazelcastInstance())),
     (classifier, photo) -> classifier.classify(photo).equals("cat")
 )
 S - type of service objectserviceFactory - the service factoryfilterFn - a filter predicate function. It must be stateless and
     cooperative.@Nonnull <S,R> GeneralStage<R> flatMapUsingService(@Nonnull ServiceFactory<?,S> serviceFactory, @Nonnull BiFunctionEx<? super S,? super T,? extends Traverser<R>> flatMapFn)
Traverser it returns as the output items. The traverser must be
 null-terminated. The mapping function receives another
 parameter, the service object, which Jet will create using the supplied
 serviceFactory.
 This sample takes a stream of products and outputs an "exploded" stream of all the parts that go into making them:
 StreamStage<Part> parts = products.flatMapUsingService(
     ServiceFactories.sharedService(ctx -> new PartRegistryCtx()),
     (registry, product) -> Traversers.traverseIterable(
                                registry.fetchParts(product))
 );
 S - type of service objectR - the type of items in the result's traversersserviceFactory - the service factoryflatMapFn - a flatmapping function, whose result type is Jet's Traverser. It must not return null traverser, but can return an
                  empty traverser. It must be stateless
                  and cooperative.@Nonnull default <K,V,R> GeneralStage<R> mapUsingReplicatedMap(@Nonnull String mapName, @Nonnull FunctionEx<? super T,? extends K> lookupKeyFn, @Nonnull BiFunctionEx<? super T,? super V,? extends R> mapFn)
ReplicatedMap with the supplied name is performed and the result of the
 lookup is merged with the item and emitted.
 
 If the result of the mapping is null, it emits nothing.
 Therefore, this stage can be used to implement filtering semantics as
 well.
 
The mapping logic is equivalent to:
 K key = lookupKeyFn.apply(item);
 V value = replicatedMap.get(key);
 return mapFn.apply(item, value);
 detail
 field on them by looking up from a registry:
 
 items.mapUsingReplicatedMap(
     "enriching-map",
     item -> item.getDetailId(),
     (Item item, ItemDetail detail) -> item.setDetail(detail)
 )
 K - type of the key in the ReplicatedMapV - type of the value in the ReplicatedMapR - type of the output itemmapName - name of the ReplicatedMaplookupKeyFn - a function which returns the key to look up in the map. Must not return
                    null. It must be stateless and cooperative.mapFn - the mapping function. It must be stateless and cooperative@Nonnull default <K,V,R> GeneralStage<R> mapUsingReplicatedMap(@Nonnull ReplicatedMap<K,V> replicatedMap, @Nonnull FunctionEx<? super T,? extends K> lookupKeyFn, @Nonnull BiFunctionEx<? super T,? super V,? extends R> mapFn)
ReplicatedMap is performed and the result of the lookup is
 merged with the item and emitted.
 
 If the result of the mapping is null, it emits nothing.
 Therefore, this stage can be used to implement filtering semantics as
 well.
 
The mapping logic is equivalent to:
 K key = lookupKeyFn.apply(item);
 V value = replicatedMap.get(key);
 return mapFn.apply(item, value);
 detail
 field on them by looking up from a registry:
 
 items.mapUsingReplicatedMap(
     enrichingMap,
     item -> item.getDetailId(),
     (item, detail) -> item.setDetail(detail)
 )
 K - type of the key in the ReplicatedMapV - type of the value in the ReplicatedMapR - type of the output itemreplicatedMap - the ReplicatedMap to lookup fromlookupKeyFn - a function which returns the key to look up in the map. Must not return
                    null. It must be stateless and cooperative.mapFn - the mapping function. It must be stateless and cooperative@Nonnull default <K,V,R> GeneralStage<R> mapUsingIMap(@Nonnull String mapName, @Nonnull FunctionEx<? super T,? extends K> lookupKeyFn, @Nonnull BiFunctionEx<? super T,? super V,? extends R> mapFn)
IMap
 with the supplied name is performed and the result of the lookup is
 merged with the item and emitted.
 
 If the result of the mapping is null, it emits nothing.
 Therefore, this stage can be used to implement filtering semantics as
 well.
 
The mapping logic is equivalent to:
 K key = lookupKeyFn.apply(item);
 V value = map.get(key);
 return mapFn.apply(item, value);
 detail
 field on them by looking up from a registry:
 
 items.mapUsingIMap(
     "enriching-map",
     item -> item.getDetailId(),
     (Item item, ItemDetail detail) -> item.setDetail(detail)
 )
 GeneralStageWithKey.mapUsingIMap(java.lang.String, com.hazelcast.function.BiFunctionEx<? super T, ? super V, ? extends R>) for a partitioned
 version of this operation.K - type of the key in the IMapV - type of the value in the IMapR - type of the output itemmapName - name of the IMaplookupKeyFn - a function which returns the key to look up in the map. Must not return
     null. It must be stateless and cooperative.mapFn - the mapping function. It must be stateless and cooperative.@Nonnull default <K,V,R> GeneralStage<R> mapUsingIMap(@Nonnull IMap<K,V> iMap, @Nonnull FunctionEx<? super T,? extends K> lookupKeyFn, @Nonnull BiFunctionEx<? super T,? super V,? extends R> mapFn)
IMap is performed and the result of the lookup is merged with
 the item and emitted.
 
 If the result of the mapping is null, it emits nothing.
 Therefore, this stage can be used to implement filtering semantics as
 well.
 
The mapping logic is equivalent to:
 K key = lookupKeyFn.apply(item);
 V value = map.get(key);
 return mapFn.apply(item, value);
 detail
 field on them by looking up from a registry:
 
 items.mapUsingIMap(
     enrichingMap,
     item -> item.getDetailId(),
     (item, detail) -> item.setDetail(detail)
 )
 GeneralStageWithKey.mapUsingIMap(java.lang.String, com.hazelcast.function.BiFunctionEx<? super T, ? super V, ? extends R>) for a partitioned
 version of this operation.K - type of the key in the IMapV - type of the value in the IMapR - type of the output itemiMap - the IMap to lookup fromlookupKeyFn - a function which returns the key to look up in the map. Must not return
     null. It must be stateless and cooperative.mapFn - the mapping function. It must be stateless and cooperative.@Nonnull <K,T1_IN,T1,R> GeneralStage<R> hashJoin(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BiFunctionEx<T,T1,R> mapToOutputFn)
package javadoc for a detailed description of the hash-join transform.
 
 This sample joins a stream of users to a stream of countries and outputs
 a stream of users with the country field set:
 
 // Types of the input stages:
 BatchStage<User> users;
 BatchStage<Map.Entry<Long, Country>> idAndCountry;
 users.hashJoin(
     idAndCountry,
     JoinClause.joinMapEntries(User::getCountryId),
     (user, country) -> user.setCountry(country)
 )
 
 This operation is subject to memory limits. See JetConfig.setMaxProcessorAccumulatedRecords(long) for more
 information.
K - the type of the join keyT1_IN - the type of stage1 itemsT1 - the result type of projection on stage1 itemsR - the resulting output typestage1 - the stage to hash-join with this onejoinClause1 - specifies how to join the two streamsmapToOutputFn - function to map the joined items to the output
                      value. It must be stateless and cooperative.@Nonnull <K,T1_IN,T1,R> GeneralStage<R> innerHashJoin(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BiFunctionEx<T,T1,R> mapToOutputFn)
package javadoc for a detailed description of the hash-join transform.
 
 This sample joins a stream of users to a stream of countries and outputs
 a stream of users with the country field set:
 
 // Types of the input stages:
 BatchStage<User> users;
 BatchStage<Map.Entry<Long, Country>> idAndCountry;
 users.innerHashJoin(
     idAndCountry,
     JoinClause.joinMapEntries(User::getCountryId),
     (user, country) -> user.setCountry(country)
 )
 
 This method is similar to hashJoin(com.hazelcast.jet.pipeline.BatchStage<T1_IN>, com.hazelcast.jet.pipeline.JoinClause<K, ? super T, ? super T1_IN, ? extends T1>, com.hazelcast.function.BiFunctionEx<T, T1, R>) method, but it guarantees
 that both input items will be not-null. Nulls will be filtered out
 before reaching #mapToOutputFn.
 
 This operation is subject to memory limits. See JetConfig.setMaxProcessorAccumulatedRecords(long) for more
 information.
K - the type of the join keyT1_IN - the type of stage1 itemsT1 - the result type of projection on stage1 itemsR - the resulting output typestage1 - the stage to hash-join with this onejoinClause1 - specifies how to join the two streamsmapToOutputFn - function to map the joined items to the output
                      value. It must be stateless and cooperative.@Nonnull <K1,K2,T1_IN,T2_IN,T1,T2,R> GeneralStage<R> hashJoin2(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BatchStage<T2_IN> stage2, @Nonnull JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2, @Nonnull TriFunction<T,T1,T2,R> mapToOutputFn)
package javadoc for a detailed description of the hash-join transform.
 
 This sample joins a stream of users to streams of countries and
 companies, and outputs a stream of users with the country and
 company fields set:
 
 // Types of the input stages:
 BatchStage<User> users;
 BatchStage<Map.Entry<Long, Country>> idAndCountry;
 BatchStage<Map.Entry<Long, Company>> idAndCompany;
 users.hashJoin2(
     idAndCountry, JoinClause.joinMapEntries(User::getCountryId),
     idAndCompany, JoinClause.joinMapEntries(User::getCompanyId),
     (user, country, company) -> user.setCountry(country).setCompany(company)
 )
 
 This operation is subject to memory limits. See JetConfig.setMaxProcessorAccumulatedRecords(long) for more
 information.
K1 - the type of key for stage1T1_IN - the type of stage1 itemsT1 - the result type of projection of stage1 itemsK2 - the type of key for stage2T2_IN - the type of stage2 itemsT2 - the result type of projection of stage2 itemsR - the resulting output typestage1 - the first stage to joinjoinClause1 - specifies how to join with stage1stage2 - the second stage to joinjoinClause2 - specifies how to join with stage2mapToOutputFn - function to map the joined items to the output
                      value. It must be stateless and cooperative.@Nonnull <K1,K2,T1_IN,T2_IN,T1,T2,R> GeneralStage<R> innerHashJoin2(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BatchStage<T2_IN> stage2, @Nonnull JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2, @Nonnull TriFunction<T,T1,T2,R> mapToOutputFn)
package javadoc for a detailed description of the hash-join transform.
 
 This sample joins a stream of users to streams of countries and
 companies, and outputs a stream of users with the country and
 company fields set:
 
 // Types of the input stages:
 BatchStage<User> users;
 BatchStage<Map.Entry<Long, Country>> idAndCountry;
 BatchStage<Map.Entry<Long, Company>> idAndCompany;
 users.innerHashJoin2(
     idAndCountry, JoinClause.joinMapEntries(User::getCountryId),
     idAndCompany, JoinClause.joinMapEntries(User::getCompanyId),
     (user, country, company) -> user.setCountry(country).setCompany(company)
 )
 
 This operation is subject to memory limits. See JetConfig.setMaxProcessorAccumulatedRecords(long) for more
 information.
 
 This method is similar to hashJoin2(com.hazelcast.jet.pipeline.BatchStage<T1_IN>, com.hazelcast.jet.pipeline.JoinClause<K1, ? super T, ? super T1_IN, ? extends T1>, com.hazelcast.jet.pipeline.BatchStage<T2_IN>, com.hazelcast.jet.pipeline.JoinClause<K2, ? super T, ? super T2_IN, ? extends T2>, com.hazelcast.jet.function.TriFunction<T, T1, T2, R>) method, but it guarantees
 that both input items will be not-null. Nulls will be filtered out
 before reaching #mapToOutputFn.
K1 - the type of key for stage1T1_IN - the type of stage1 itemsT1 - the result type of projection of stage1 itemsK2 - the type of key for stage2T2_IN - the type of stage2 itemsT2 - the result type of projection of stage2 itemsR - the resulting output typestage1 - the first stage to joinjoinClause1 - specifies how to join with stage1stage2 - the second stage to joinjoinClause2 - specifies how to join with stage2mapToOutputFn - function to map the joined items to the output
                      value. It must be stateless and cooperative.@Nonnull GeneralHashJoinBuilder<T> hashJoinBuilder()
stage.hashJoinN(...) calls because they offer
 more static type safety.
 
 This sample joins a stream of users to streams of countries and
 companies, and outputs a stream of users with the country and
 company fields set:
 
 // Types of the input stages:
 StreamStage<User> users;
 BatchStage<Map.Entry<Long, Country>> idAndCountry;
 BatchStage<Map.Entry<Long, Company>> idAndCompany;
 StreamHashJoinBuilder<User> builder = users.hashJoinBuilder();
 Tag<Country> tCountry = builder.add(idAndCountry,
         JoinClause.joinMapEntries(User::getCountryId));
 Tag<Company> tCompany = builder.add(idAndCompany,
         JoinClause.joinMapEntries(User::getCompanyId));
 StreamStage<User> joined = builder.build((user, itemsByTag) ->
         user.setCountry(itemsByTag.get(tCountry)).setCompany(itemsByTag.get(tCompany)));
 
 This operation is subject to memory limits. See JetConfig.setMaxProcessorAccumulatedRecords(long) for more
 information.
@Nonnull <K> GeneralStageWithKey<T,K> groupingKey(@Nonnull FunctionEx<? super T,? extends K> keyFn)
Sample usage:
 users.groupingKey(User::getId)
 
 Note: make sure the extracted key is not-null, it would fail the
 job otherwise. Also make sure that it implements equals() and
 hashCode().
K - type of the keykeyFn - function that extracts the grouping key. It must be
     stateless and cooperative.@Nonnull GeneralStage<T> rebalance()
To implement rebalancing, Jet uses a distributed unicast data routing pattern on the DAG edge from this stage's vertex to the next one. It routes the data in a round-robin fashion, sending each item to the next member (member list includes the local one as well). If a given member's queue is overloaded and applying backpressure, it skips it and retries in the next round. With this scheme you get perfectly balanced item counts on each member under light load, but under heavier load it favors throughput: if the network becomes a bottleneck, most data may stay local.
These are some basic invariants:
stage.rebalance().groupingKey(keyFn).aggregate(...): here Jet
     removes the first (local) aggregation vertex and goes straight to
     distributed aggregation without combining. The data is rebalanced
     through partitioning.
 stage.rebalance().aggregate(...): in this case the second vertex
     is non-parallelizable and must execute on a single member. Therefore Jet
     keeps both vertices and applies rebalancing before the first one.
 @Nonnull <K> GeneralStage<T> rebalance(@Nonnull FunctionEx<? super T,? extends K> keyFn)
With partitioned rebalancing, you supply your own function that decides (indirectly) where to send each data item. Jet first applies your partition key function to the data item and then its own partitioning function to the key. The result is that all items with the same key go to the same Jet processor and different keys are distributed pseudo-randomly across the processors.
Compared to non-partitioned balancing, partitioned balancing enforces the same data distribution across members regardless of any bottlenecks. If a given member is overloaded and applies backpressure, Jet doesn't reroute the data to other members, but propagates the backpressure to the upstream. If you choose a partitioning key that has a skewed distribution (some keys being much more frequent), this will result in an imbalanced data flow.
These are some basic invariants:
stage.rebalance(rebalanceKeyFn).groupingKey(groupKeyFn).aggregate(...):
     here Jet removes the first (local) aggregation vertex and goes straight
     to distributed aggregation without combining. Grouped aggregation
     requires the data to be partitioned by the grouping key and therefore
     Jet must ignore the rebalancing key you supplied. We recommend that you
     remove it and use the parameterless stage.rebalance()
     because the end result is identical.
 stage.rebalance().aggregate(...): in this case the second vertex
     is non-parallelizable and must execute on a single member. Therefore Jet
     keeps both vertices and applies partitioned rebalancing before the first
     one.
 K - type of the keykeyFn - the partitioning key function. It must be stateless and
     cooperative.@Nonnull StreamStage<T> addTimestamps(@Nonnull ToLongFunctionEx<? super T> timestampFn, long allowedLag)
allowedLag parameter controls by how much
 the timestamp can be lower than the highest one observed so far. If
 it is even lower, Jet will drop the item as being "too late".
 
 For example, if the sequence of the timestamps is [1,4,3,2] and
 you configured the allowed lag as 1, Jet will let through the
 event with timestamp 3, but it will drop the last one (timestamp
 2).
 
The amount of lag you configure strongly influences the latency of Jet's output. Jet cannot finalize the window until it knows it has observed all the events belonging to it, and the more lag it must tolerate, the longer will it have to wait for possible latecomers. On the other hand, if you don't allow enough lag, you face the risk of failing to account for the data that came in after the results were already emitted.
Sample usage:
 events.addTimestamps(Event::getTimestamp, 1000)
 
 Note: This method adds the timestamps after the source emitted
 them. When timestamps are added at this moment, source partitions won't
 be coalesced properly and will be treated as a single stream. The
 allowed lag will need to cover for the additional disorder introduced by
 merging the streams. The streams are merged in an unpredictable order,
 and it can happen, for example, that after the job was suspended for a
 long time, there can be a very recent event in partition1 and a very old
 event partition2. If partition1 happens to be merged first, the recent
 event could render the old one late, if the allowed lag is not large
 enough.
 To add timestamps in source, use withTimestamps().
 
 Warning: make sure the property you access in timestampFn
 isn't null, it would fail the job. Also, that there are no nonsensical
 values such as -1, MIN_VALUE, 2100-01-01 etc. - we'll treat those as real
 timestamps, and they can cause unspecified behaviour.
timestampFn - a function that returns the timestamp for each item,
                    typically in milliseconds. It must be stateless and cooperative.allowedLag - the allowed lag behind the top observed timestamp.
                   Time unit is the same as the unit used by timestampFnIllegalArgumentException - if this stage already has timestamps@Nonnull SinkStage writeTo(@Nonnull Sink<? super T> sink)
 You cannot reuse the sink in other writeTo calls. If you want to
 write multiple stages to the same sink, use Pipeline.writeTo(com.hazelcast.jet.pipeline.Sink<? super T>, com.hazelcast.jet.pipeline.GeneralStage<? extends T>, com.hazelcast.jet.pipeline.GeneralStage<? extends T>, com.hazelcast.jet.pipeline.GeneralStage<? extends T>...).
 This will be more efficient than creating a new sink each time.
@Nonnull GeneralStage<T> peek(@Nonnull PredicateEx<? super T> shouldLogFn, @Nonnull FunctionEx<? super T,? extends CharSequence> toStringFn)
shouldLogFn predicate to see whether to log the item
 toStringFn to get the item's string
     representation
 com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
 
 Note that peek after rebalance(FunctionEx) operation is not supported.
 
Sample usage:
 users.peek(
     user -> user.getName().size() > 100,
     User::getName
 )
 shouldLogFn - a function to filter the logged items. You can use alwaysTrue() as a pass-through filter when you
                    don't need any filtering. It must be stateless and cooperative.toStringFn - a function that returns a string representation of
                    the item. It must be stateless and cooperative.peek(FunctionEx), 
peek()@Nonnull default GeneralStage<T> peek(@Nonnull FunctionEx<? super T,? extends CharSequence> toStringFn)
toStringFn to get a string representation of the item
 com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
 
 Note that peek after rebalance(FunctionEx) operation is not supported.
 
Sample usage:
 users.peek(User::getName)
 toStringFn - a function that returns a string representation of
     the item. It must be stateless and cooperative.peek(PredicateEx, FunctionEx), 
peek()@Nonnull default GeneralStage<T> peek()
toString()
 method at the INFO level to the log category com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>.
 The stage logs each item on the cluster member that outputs it. Its
 primary purpose is for development use, when running Jet on a local
 machine.
 
 Note that peek after rebalance(FunctionEx) is not supported.
peek(PredicateEx, FunctionEx), 
peek(FunctionEx)@Nonnull <R> GeneralStage<R> customTransform(@Nonnull String stageName, @Nonnull SupplierEx<Processor> procSupplier)
Processors.
 Note that the type parameter of the returned stage is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
R - the type of the output itemsstageName - a human-readable name for the custom stageprocSupplier - the supplier of processors@Nonnull <R> GeneralStage<R> customTransform(@Nonnull String stageName, @Nonnull ProcessorSupplier procSupplier)
Processors.
 Note that the type parameter of the returned stage is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
R - the type of the output itemsstageName - a human-readable name for the custom stageprocSupplier - the supplier of processors@Nonnull <R> GeneralStage<R> customTransform(@Nonnull String stageName, @Nonnull ProcessorMetaSupplier procSupplier)
Processors.
 Note that the type parameter of the returned stage is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
R - the type of the output itemsstageName - a human-readable name for the custom stageprocSupplier - the supplier of processors@Nonnull GeneralStage<T> setLocalParallelism(int localParallelism)
StageWhile most stages are backed by 1 vertex, there are exceptions. If a stage uses two vertices, each of them will have the given local parallelism, so in total there will be twice as many processors per member.
The default value is and it signals to Jet to figure out a default value. Jet will determine the vertex's local parallelism during job initialization from the global default and the processor meta-supplier's preferred value.
setLocalParallelism in interface StageCopyright © 2024 Hazelcast, Inc.. All rights reserved.