T
- the type of items coming out of this stagepublic interface GeneralStage<T> extends Stage
Modifier and Type | Method and Description |
---|---|
default StreamStage<T> |
addTimestamps()
Adds a timestamp to each item in the stream using the current system
time.
|
StreamStage<T> |
addTimestamps(DistributedToLongFunction<? super T> timestampFn,
long allowedLag)
Adds a timestamp to each item in the stream using the supplied function
and specifies the allowed amount of disorder between them.
|
<R> GeneralStage<R> |
customTransform(String stageName,
DistributedSupplier<Processor> procSupplier)
Attaches a stage with a custom transform based on the provided supplier
of Core API
Processor s. |
SinkStage |
drainTo(Sink<? super T> sink)
Attaches a sink stage, one that accepts data but doesn't
emit any.
|
GeneralStage<T> |
filter(DistributedPredicate<T> filterFn)
Attaches a filtering stage which applies the provided predicate function
to each input item to decide whether to pass the item to the output or
to discard it.
|
<C> GeneralStage<T> |
filterUsingContext(ContextFactory<C> contextFactory,
DistributedBiPredicate<? super C,? super T> filterFn)
Attaches a filtering stage which applies the provided predicate function
to each input item to decide whether to pass the item to the output or
to discard it.
|
<R> GeneralStage<R> |
flatMap(DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches a flat-mapping stage which applies the supplied function to
each input item independently and emits all the items from the
Traverser it returns. |
<C,R> GeneralStage<R> |
flatMapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches a flat-mapping stage which applies the supplied function to
each input item independently and emits all items from the
Traverser it returns as the output items. |
<K> GeneralStageWithKey<T,K> |
groupingKey(DistributedFunction<? super T,? extends K> keyFn)
Specifies the function that will extract a key from the items in the
associated pipeline stage.
|
<K,T1_IN,T1,R> |
hashJoin(BatchStage<T1_IN> stage1,
JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1,
DistributedBiFunction<T,T1,R> mapToOutputFn)
Attaches to both this and the supplied stage a hash-joining stage and
returns it.
|
<K1,K2,T1_IN,T2_IN,T1,T2,R> |
hashJoin2(BatchStage<T1_IN> stage1,
JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1,
BatchStage<T2_IN> stage2,
JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2,
DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
Attaches to this and the two supplied stages a hash-joining stage and
returns it.
|
GeneralHashJoinBuilder<T> |
hashJoinBuilder()
Returns a fluent API builder object to construct a hash join operation
with any number of contributing stages.
|
<R> GeneralStage<R> |
map(DistributedFunction<? super T,? extends R> mapFn)
Attaches a mapping stage which applies the supplied function to each
input item independently and emits the function's result as the output
item.
|
<C,R> GeneralStage<R> |
mapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends R> mapFn)
Attaches a mapping stage which applies the supplied function to each
input item independently and emits the function's result as the output
item.
|
default <K,V,R> GeneralStage<R> |
mapUsingIMap(IMap<K,V> iMap,
DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
Attaches a
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast IMap . |
default <K,V,R> GeneralStage<R> |
mapUsingIMap(String mapName,
DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
Attaches a
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast IMap with the supplied name. |
default <K,V,R> GeneralStage<R> |
mapUsingReplicatedMap(ReplicatedMap<K,V> replicatedMap,
DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
Attaches a
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast ReplicatedMap . |
default <K,V,R> GeneralStage<R> |
mapUsingReplicatedMap(String mapName,
DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
Attaches a
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>) stage where the context is a
Hazelcast ReplicatedMap with the supplied name. |
default GeneralStage<T> |
peek()
Adds a peeking layer to this compute stage which logs its output.
|
default GeneralStage<T> |
peek(DistributedFunction<? super T,? extends CharSequence> toStringFn)
Adds a peeking layer to this compute stage which logs its output.
|
GeneralStage<T> |
peek(DistributedPredicate<? super T> shouldLogFn,
DistributedFunction<? super T,? extends CharSequence> toStringFn)
Attaches a peeking stage which logs this stage's output and passes it
through without transformation.
|
<R> GeneralStage<R> |
rollingAggregate(AggregateOperation1<? super T,?,? extends R> aggrOp)
Attaches a rolling aggregation stage.
|
getPipeline, name, setLocalParallelism, setName
@Nonnull <R> GeneralStage<R> map(@Nonnull DistributedFunction<? super T,? extends R> mapFn)
null
, it emits nothing. Therefore this
stage can be used to implement filtering semantics as well.R
- the result type of the mapping functionmapFn
- a stateless mapping function@Nonnull GeneralStage<T> filter(@Nonnull DistributedPredicate<T> filterFn)
filterFn
- a stateless filter predicate function@Nonnull <R> GeneralStage<R> flatMap(@Nonnull DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
Traverser
it returns. The traverser must be null-terminated.R
- the type of items in the result's traversersflatMapFn
- a stateless flatmapping function, whose result type is
Jet's Traverser
@Nonnull <C,R> GeneralStage<R> mapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends R> mapFn)
contextFactory
.
If the mapping result is null
, it emits nothing. Therefore this
stage can be used to implement filtering semantics as well.
C
- type of context objectR
- the result type of the mapping functioncontextFactory
- the context factorymapFn
- a stateless mapping function@Nonnull <C> GeneralStage<T> filterUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiPredicate<? super C,? super T> filterFn)
contextFactory
.
C
- type of context objectcontextFactory
- the context factoryfilterFn
- a stateless filter predicate function@Nonnull <C,R> GeneralStage<R> flatMapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
Traverser
it returns as the output items. The traverser must be
null-terminated. The mapping function receives another
parameter, the context object, which Jet will create using the supplied
contextFactory
.
C
- type of context objectR
- the type of items in the result's traverserscontextFactory
- the context factoryflatMapFn
- a stateless flatmapping function, whose result type is
Jet's Traverser
@Nonnull default <K,V,R> GeneralStage<R> mapUsingReplicatedMap(@Nonnull String mapName, @Nonnull DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast ReplicatedMap
with the supplied name. The mapping
function will receive it as the first argument.K
- type of the key in the ReplicatedMap
V
- type of the value in the ReplicatedMap
R
- type of the output itemmapName
- name of the ReplicatedMap
mapFn
- the mapping function@Nonnull default <K,V,R> GeneralStage<R> mapUsingReplicatedMap(@Nonnull ReplicatedMap<K,V> replicatedMap, @Nonnull DistributedBiFunction<? super ReplicatedMap<K,V>,? super T,? extends R> mapFn)
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast ReplicatedMap
. It is not necessarily the
map you provide here, but a replicated map with the same name
in the Jet cluster that executes the pipeline. The mapping function
will receive the replicated map as the first argument.K
- type of the key in the ReplicatedMap
V
- type of the value in the ReplicatedMap
R
- type of the output itemreplicatedMap
- the ReplicatedMap
to use as contextmapFn
- the mapping function@Nonnull default <K,V,R> GeneralStage<R> mapUsingIMap(@Nonnull String mapName, @Nonnull DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast IMap
with the supplied name. The mapping function
will receive it as the first argument.K
- type of the key in the IMap
V
- type of the value in the IMap
R
- type of the output itemmapName
- name of the IMap
mapFn
- the mapping function@Nonnull default <K,V,R> GeneralStage<R> mapUsingIMap(@Nonnull IMap<K,V> iMap, @Nonnull DistributedBiFunction<? super IMap<K,V>,? super T,? extends R> mapFn)
mapUsingContext(com.hazelcast.jet.pipeline.ContextFactory<C>, com.hazelcast.jet.function.DistributedBiFunction<? super C, ? super T, ? extends R>)
stage where the context is a
Hazelcast IMap
. It is not necessarily the map you
provide here, but a map with the same name in the Jet cluster
that executes the pipeline. The mapping function will receive the
replicated map as the first argument.K
- type of the key in the IMap
V
- type of the value in the IMap
R
- type of the output itemiMap
- the IMap
to use as the contextmapFn
- the mapping function@Nonnull <R> GeneralStage<R> rollingAggregate(@Nonnull AggregateOperation1<? super T,?,? extends R> aggrOp)
{2, 7, 8, -5}
, the output will be {2, 9, 17, 12}
. The
number of input and output items is equal.
This stage is fault-tolerant and saves its state to the snapshot.
NOTE 1: since the output for each item depends on all
the previous items, this operation cannot be parallelized. Jet will
perform it on a single member, single-threaded. Jet also supports
keyed rolling aggregation
which it can parallelize by partitioning.
NOTE 2: if you plan to use an aggregate operation whose
result size grows with input size (such as toList
and your data
source is unbounded, you must carefully consider the memory demands this
implies. The result will keep growing forever.
R
- result type of the aggregate operationaggrOp
- the aggregate operation to do the aggregation@Nonnull <K,T1_IN,T1,R> GeneralStage<R> hashJoin(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull DistributedBiFunction<T,T1,R> mapToOutputFn)
package javadoc
for a detailed description of the hash-join transform.K
- the type of the join keyT1_IN
- the type of stage1
itemsT1
- the result type of projection on stage1
itemsR
- the resulting output typestage1
- the stage to hash-join with this onejoinClause1
- specifies how to join the two streamsmapToOutputFn
- function to map the joined items to the output value@Nonnull <K1,K2,T1_IN,T2_IN,T1,T2,R> GeneralStage<R> hashJoin2(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BatchStage<T2_IN> stage2, @Nonnull JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2, @Nonnull DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
package javadoc
for a detailed description of the hash-join transform.K1
- the type of key for stage1
T1_IN
- the type of stage1
itemsT1
- the result type of projection of stage1
itemsK2
- the type of key for stage2
T2_IN
- the type of stage2
itemsT2
- the result type of projection of stage2
itemsR
- the resulting output typestage1
- the first stage to joinjoinClause1
- specifies how to join with stage1
stage2
- the second stage to joinjoinClause2
- specifies how to join with stage2
mapToOutputFn
- function to map the joined items to the output value@Nonnull GeneralHashJoinBuilder<T> hashJoinBuilder()
stage.hashJoinN(...)
calls because they offer
more static type safety.@Nonnull <K> GeneralStageWithKey<T,K> groupingKey(@Nonnull DistributedFunction<? super T,? extends K> keyFn)
equals()
and hashCode()
.K
- type of the keykeyFn
- function that extracts the grouping key@Nonnull default StreamStage<T> addTimestamps()
addTimestamps(timestampFn, allowedLag
to extract them.IllegalArgumentException
- if this stage already has timestamps@Nonnull StreamStage<T> addTimestamps(@Nonnull DistributedToLongFunction<? super T> timestampFn, long allowedLag)
allowedLag
parameter controls by how much
the timestamp can be lower than the highest one observed so far. If
it is even lower, Jet will drop the item as being "too late".
For example, if the sequence of the timestamps is [1,4,3,2]
and
you configured the allowed lag as 1
, Jet will let through the
event with timestamp 3
, but it will drop the last one (timestamp
2
).
The amount of lag you configure strongly influences the latency of Jet's output. Jet cannot finalize the window until it knows it has observed all the events belonging to it, and the more lag it must tolerate, the longer will it have to wait for possible latecomers. On the other hand, if you don't allow enough lag, you face the risk of failing to account for the data that came in after the results were already emitted.
You should strongly prefer adding this stage right after the source. In that case Jet can compute the watermark inside the source connector, taking into account its partitioning. It can maintain a separate watermark value for each partition and coalesce them into the overall watermark without causing dropped events. If you add the timestamps later on, events from different partitions may be mixed, increasing the perceived event lag and causing more dropped events.
Warning: make sure the property you access in timestampFn
isn't null, it would fail the job. Also that there are no nonsensical
values such as -1, MIN_VALUE, 2100-01-01 etc - we'll treat those as real
timestamps and they can cause unspecified behaviour.
timestampFn
- a function that returns the timestamp for each item,
typically in millisecondsallowedLag
- the allowed lag behind the top observed timestamp.
Time unit is the same as the unit used by timestampFn
IllegalArgumentException
- if this stage already has timestamps@Nonnull SinkStage drainTo(@Nonnull Sink<? super T> sink)
@Nonnull GeneralStage<T> peek(@Nonnull DistributedPredicate<? super T> shouldLogFn, @Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
shouldLogFn
predicate to see whether to log the item
toStringFn
to get the item's string
representation
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
shouldLogFn
- a function to filter the logged items. You can use alwaysTrue()
as a pass-through filter when you don't need any
filtering.toStringFn
- a function that returns a string representation of the itempeek(DistributedFunction)
,
peek()
@Nonnull default GeneralStage<T> peek(@Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
toStringFn
to get a string representation of the item
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
toStringFn
- a function that returns a string representation of the itempeek(DistributedPredicate, DistributedFunction)
,
peek()
@Nonnull default GeneralStage<T> peek()
toString()
method at the INFO level to the log category com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
.
The stage logs each item on whichever cluster member it happens to
receive it. Its primary purpose is for development use, when running Jet
on a local machine.@Nonnull <R> GeneralStage<R> customTransform(@Nonnull String stageName, @Nonnull DistributedSupplier<Processor> procSupplier)
Processor
s.
Note that the returned stage's type parameter is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
R
- the type of the output itemsstageName
- a human-readable name for the custom stageprocSupplier
- the supplier of processorsCopyright © 2018 Hazelcast, Inc.. All rights reserved.