T
- the type of items coming out of this stagepublic interface StreamStage<T> extends GeneralStage<T>
pipeline
that will
observe an unbounded amount of data (i.e., an event stream). It accepts
input from its upstream stages (if any) and passes its output to its
downstream stages.Modifier and Type | Method and Description |
---|---|
<R> StreamStage<R> |
customTransform(String stageName,
DistributedSupplier<Processor> procSupplier)
Attaches a stage with a custom transform based on the provided supplier
of Core API
Processor s. |
StreamStage<T> |
filter(DistributedPredicate<T> filterFn)
Attaches a filtering stage which applies the provided predicate function
to each input item to decide whether to pass the item to the output or
to discard it.
|
<C> StreamStage<T> |
filterUsingContext(ContextFactory<C> contextFactory,
DistributedBiPredicate<? super C,? super T> filterFn)
Attaches a filtering stage which applies the provided predicate function
to each input item to decide whether to pass the item to the output or
to discard it.
|
<R> StreamStage<R> |
flatMap(DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches a flat-mapping stage which applies the supplied function to
each input item independently and emits all the items from the
Traverser it returns. |
<C,R> StreamStage<R> |
flatMapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches a flat-mapping stage which applies the supplied function to
each input item independently and emits all items from the
Traverser it returns as the output items. |
<K> StreamStageWithKey<T,K> |
groupingKey(DistributedFunction<? super T,? extends K> keyFn)
Specifies the function that will extract a key from the items in the
associated pipeline stage.
|
<K,T1_IN,T1,R> |
hashJoin(BatchStage<T1_IN> stage1,
JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1,
DistributedBiFunction<T,T1,R> mapToOutputFn)
Attaches to both this and the supplied stage a hash-joining stage and
returns it.
|
<K1,K2,T1_IN,T2_IN,T1,T2,R> |
hashJoin2(BatchStage<T1_IN> stage1,
JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1,
BatchStage<T2_IN> stage2,
JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2,
DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
Attaches to this and the two supplied stages a hash-joining stage and
returns it.
|
default StreamHashJoinBuilder<T> |
hashJoinBuilder()
Returns a fluent API builder object to construct a hash join operation
with any number of contributing stages.
|
<R> StreamStage<R> |
map(DistributedFunction<? super T,? extends R> mapFn)
Attaches a mapping stage which applies the supplied function to each
input item independently and emits the function's result as the output
item.
|
<C,R> StreamStage<R> |
mapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends R> mapFn)
Attaches a mapping stage which applies the supplied function to each
input item independently and emits the function's result as the output
item.
|
default <K,V,R> StreamStage<R> |
mapUsingIMap(IMap<K,V> iMap,
DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
Attaches a
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast IMap . |
default <K,V,R> StreamStage<R> |
mapUsingIMap(String mapName,
DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
Attaches a
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast IMap with the supplied name. |
default <K,V,R> StreamStage<R> |
mapUsingReplicatedMap(ReplicatedMap<K,V> replicatedMap,
DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
Attaches a
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast ReplicatedMap . |
default <K,V,R> StreamStage<R> |
mapUsingReplicatedMap(String mapName,
DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
Attaches a
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast ReplicatedMap with the supplied name. |
StreamStage<T> |
merge(StreamStage<? extends T> other)
Attaches a stage that emits all the items from this stage as well as all
the items from the supplied stage.
|
default StreamStage<T> |
peek()
Adds a peeking layer to this compute stage which logs its output.
|
default StreamStage<T> |
peek(DistributedFunction<? super T,? extends CharSequence> toStringFn)
Adds a peeking layer to this compute stage which logs its output.
|
StreamStage<T> |
peek(DistributedPredicate<? super T> shouldLogFn,
DistributedFunction<? super T,? extends CharSequence> toStringFn)
Attaches a peeking stage which logs this stage's output and passes it
through without transformation.
|
<R> StreamStage<R> |
rollingAggregate(AggregateOperation1<? super T,?,? extends R> aggrOp)
Attaches a rolling aggregation stage.
|
StreamStage<T> |
setLocalParallelism(int localParallelism)
Sets the preferred local parallelism (number of processors per Jet
cluster member) this stage will configure its DAG vertices with.
|
StreamStage<T> |
setName(String name)
Overrides the default name of the stage with the name you choose and
returns the stage.
|
StageWithWindow<T> |
window(WindowDefinition wDef)
Adds the given window definition to this stage, as the first step in the
construction of a pipeline stage that performs windowed aggregation.
|
addTimestamps, addTimestamps, drainTo
getPipeline, name
@Nonnull StageWithWindow<T> window(WindowDefinition wDef)
factory methods in WindowDefiniton
@Nonnull StreamStage<T> merge(@Nonnull StreamStage<? extends T> other)
other
- the other stage whose data to merge into this one@Nonnull <K> StreamStageWithKey<T,K> groupingKey(@Nonnull DistributedFunction<? super T,? extends K> keyFn)
GeneralStage
equals()
and hashCode()
.groupingKey
in interface GeneralStage<T>
K
- type of the keykeyFn
- function that extracts the grouping key@Nonnull <R> StreamStage<R> map(@Nonnull DistributedFunction<? super T,? extends R> mapFn)
GeneralStage
null
, it emits nothing. Therefore this
stage can be used to implement filtering semantics as well.map
in interface GeneralStage<T>
R
- the result type of the mapping functionmapFn
- a stateless mapping function@Nonnull StreamStage<T> filter(@Nonnull DistributedPredicate<T> filterFn)
GeneralStage
filter
in interface GeneralStage<T>
filterFn
- a stateless filter predicate function@Nonnull <R> StreamStage<R> flatMap(@Nonnull DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
GeneralStage
Traverser
it returns. The traverser must be null-terminated.flatMap
in interface GeneralStage<T>
R
- the type of items in the result's traversersflatMapFn
- a stateless flatmapping function, whose result type is
Jet's Traverser
@Nonnull <C,R> StreamStage<R> mapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends R> mapFn)
GeneralStage
contextFactory
.
If the mapping result is null
, it emits nothing. Therefore this
stage can be used to implement filtering semantics as well.
mapUsingContext
in interface GeneralStage<T>
C
- type of context objectR
- the result type of the mapping functioncontextFactory
- the context factorymapFn
- a stateless mapping function@Nonnull <C> StreamStage<T> filterUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiPredicate<? super C,? super T> filterFn)
GeneralStage
contextFactory
.
filterUsingContext
in interface GeneralStage<T>
C
- type of context objectcontextFactory
- the context factoryfilterFn
- a stateless filter predicate function@Nonnull <C,R> StreamStage<R> flatMapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
GeneralStage
Traverser
it returns as the output items. The traverser must be
null-terminated. The mapping function receives another
parameter, the context object, which Jet will create using the supplied
contextFactory
.
flatMapUsingContext
in interface GeneralStage<T>
C
- type of context objectR
- the type of items in the result's traverserscontextFactory
- the context factoryflatMapFn
- a stateless flatmapping function, whose result type is
Jet's Traverser
@Nonnull default <K,V,R> StreamStage<R> mapUsingReplicatedMap(@Nonnull String mapName, @Nonnull DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
GeneralStage
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast ReplicatedMap
with the supplied name. The mapping
function will receive it as the first argument.mapUsingReplicatedMap
in interface GeneralStage<T>
K
- type of the key in the ReplicatedMap
V
- type of the value in the ReplicatedMap
R
- type of the output itemmapName
- name of the ReplicatedMap
mapFn
- the mapping function@Nonnull default <K,V,R> StreamStage<R> mapUsingReplicatedMap(@Nonnull ReplicatedMap<K,V> replicatedMap, @Nonnull DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
GeneralStage
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast ReplicatedMap
. It is not necessarily the
map you provide here, but a replicated map with the same name
in the Jet cluster that executes the pipeline. The mapping function
will receive the replicated map as the first argument.mapUsingReplicatedMap
in interface GeneralStage<T>
K
- type of the key in the ReplicatedMap
V
- type of the value in the ReplicatedMap
R
- type of the output itemreplicatedMap
- the ReplicatedMap
to use as contextmapFn
- the mapping function@Nonnull default <K,V,R> StreamStage<R> mapUsingIMap(@Nonnull String mapName, @Nonnull DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
GeneralStage
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast IMap
with the supplied name. The mapping function
will receive it as the first argument.mapUsingIMap
in interface GeneralStage<T>
K
- type of the key in the IMap
V
- type of the value in the IMap
R
- type of the output itemmapName
- name of the IMap
mapFn
- the mapping function@Nonnull default <K,V,R> StreamStage<R> mapUsingIMap(@Nonnull IMap<K,V> iMap, @Nonnull DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
GeneralStage
GeneralStage.mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast IMap
. It is not necessarily the map you
provide here, but a map with the same name in the Jet cluster
that executes the pipeline. The mapping function will receive the
replicated map as the first argument.mapUsingIMap
in interface GeneralStage<T>
K
- type of the key in the IMap
V
- type of the value in the IMap
R
- type of the output itemiMap
- the IMap
to use as the contextmapFn
- the mapping function@Nonnull <R> StreamStage<R> rollingAggregate(@Nonnull AggregateOperation1<? super T,?,? extends R> aggrOp)
GeneralStage
{2, 7, 8, -5}
, the output will be {2, 9, 17, 12}
. The
number of input and output items is equal.
This stage is fault-tolerant and saves its state to the snapshot.
NOTE 1: since the output for each item depends on all
the previous items, this operation cannot be parallelized. Jet will
perform it on a single member, single-threaded. Jet also supports
keyed rolling aggregation
which it can parallelize by partitioning.
NOTE 2: if you plan to use an aggregate operation whose
result size grows with input size (such as toList
and your data
source is unbounded, you must carefully consider the memory demands this
implies. The result will keep growing forever.
rollingAggregate
in interface GeneralStage<T>
R
- result type of the aggregate operationaggrOp
- the aggregate operation to do the aggregation@Nonnull <K,T1_IN,T1,R> StreamStage<R> hashJoin(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull DistributedBiFunction<T,T1,R> mapToOutputFn)
GeneralStage
package javadoc
for a detailed description of the hash-join transform.hashJoin
in interface GeneralStage<T>
K
- the type of the join keyT1_IN
- the type of stage1
itemsT1
- the result type of projection on stage1
itemsR
- the resulting output typestage1
- the stage to hash-join with this onejoinClause1
- specifies how to join the two streamsmapToOutputFn
- function to map the joined items to the output value@Nonnull <K1,K2,T1_IN,T2_IN,T1,T2,R> StreamStage<R> hashJoin2(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BatchStage<T2_IN> stage2, @Nonnull JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2, @Nonnull DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
GeneralStage
package javadoc
for a detailed description of the hash-join transform.hashJoin2
in interface GeneralStage<T>
K1
- the type of key for stage1
K2
- the type of key for stage2
T1_IN
- the type of stage1
itemsT2_IN
- the type of stage2
itemsT1
- the result type of projection of stage1
itemsT2
- the result type of projection of stage2
itemsR
- the resulting output typestage1
- the first stage to joinjoinClause1
- specifies how to join with stage1
stage2
- the second stage to joinjoinClause2
- specifies how to join with stage2
mapToOutputFn
- function to map the joined items to the output value@Nonnull default StreamHashJoinBuilder<T> hashJoinBuilder()
GeneralStage
stage.hashJoinN(...)
calls because they offer
more static type safety.hashJoinBuilder
in interface GeneralStage<T>
@Nonnull default StreamStage<T> peek()
GeneralStage
toString()
method at the INFO level to the log category com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
.
The stage logs each item on whichever cluster member it happens to
receive it. Its primary purpose is for development use, when running Jet
on a local machine.peek
in interface GeneralStage<T>
GeneralStage.peek(DistributedPredicate, DistributedFunction)
,
GeneralStage.peek(DistributedFunction)
@Nonnull StreamStage<T> peek(@Nonnull DistributedPredicate<? super T> shouldLogFn, @Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
GeneralStage
shouldLogFn
predicate to see whether to log the item
toStringFn
to get the item's string
representation
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
peek
in interface GeneralStage<T>
shouldLogFn
- a function to filter the logged items. You can use alwaysTrue()
as a pass-through filter when you don't need any
filtering.toStringFn
- a function that returns a string representation of the itemGeneralStage.peek(DistributedFunction)
,
GeneralStage.peek()
@Nonnull default StreamStage<T> peek(@Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
GeneralStage
toStringFn
to get a string representation of the item
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
peek
in interface GeneralStage<T>
toStringFn
- a function that returns a string representation of the itemGeneralStage.peek(DistributedPredicate, DistributedFunction)
,
GeneralStage.peek()
@Nonnull <R> StreamStage<R> customTransform(@Nonnull String stageName, @Nonnull DistributedSupplier<Processor> procSupplier)
GeneralStage
Processor
s.
Note that the returned stage's type parameter is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
customTransform
in interface GeneralStage<T>
R
- the type of the output itemsstageName
- a human-readable name for the custom stageprocSupplier
- the supplier of processors@Nonnull StreamStage<T> setLocalParallelism(int localParallelism)
Stage
While most stages are backed by 1 vertex, there are exceptions. If a stage uses two vertices, each of them will have the given local parallelism, so in total there will be twice as many processors per member.
The default value is -1 and it signals to Jet to figure out a default value. Jet will determine the vertex's local parallelism during job initialization from the global default and the processor meta-supplier's preferred value.
setLocalParallelism
in interface Stage
Copyright © 2018 Hazelcast, Inc.. All rights reserved.