T - the type of items coming out of this stagepublic interface GeneralStage<T> extends Stage
| Modifier and Type | Method and Description |
|---|---|
default StreamStage<T> |
addTimestamps()
Adds a timestamp to each item in the stream using the current system
time.
|
StreamStage<T> |
addTimestamps(DistributedToLongFunction<? super T> timestampFn,
long allowedLag)
Adds a timestamp to each item in the stream using the supplied function
and specifies the allowed amount of disorder between them.
|
<R> GeneralStage<R> |
customTransform(String stageName,
DistributedSupplier<Processor> procSupplier)
Attaches to this stage a stage with a custom transform based on the
provided supplier of Core API
Processors. |
SinkStage |
drainTo(Sink<? super T> sink)
Attaches to this stage a sink stage, one that accepts data but doesn't
emit any.
|
GeneralStage<T> |
filter(DistributedPredicate<T> filterFn)
Attaches to this stage a filtering stage, one which applies the provided
predicate function to each input item to decide whether to pass the item
to the output or to discard it.
|
<C> GeneralStage<T> |
filterUsingContext(ContextFactory<C> contextFactory,
DistributedBiPredicate<? super C,? super T> filterFn)
Attaches to this stage a filtering stage, one which applies the provided
predicate function to each input item to decide whether to pass the item
to the output or to discard it.
|
<R> GeneralStage<R> |
flatMap(DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches to this stage a flat-mapping stage, one which applies the
supplied function to each input item independently and emits all the
items from the
Traverser it returns. |
<C,R> GeneralStage<R> |
flatMapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches to this stage a flat-mapping stage, one which applies the
supplied function to each input item independently and emits all items
from the
Traverser it returns as the output items. |
<K> GeneralStageWithGrouping<T,K> |
groupingKey(DistributedFunction<? super T,? extends K> keyFn)
Specifes the function that will extract the grouping key from the items
in the associated pipeline stage, as first step in the construction of a
group-and-aggregate stage.
|
<K,T1_IN,T1,R> |
hashJoin(BatchStage<T1_IN> stage1,
JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1,
DistributedBiFunction<T,T1,R> mapToOutputFn)
Attaches to both this and the supplied stage a hash-joining stage and
returns it.
|
<K1,T1_IN,T1,K2,T2_IN,T2,R> |
hashJoin2(BatchStage<T1_IN> stage1,
JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1,
BatchStage<T2_IN> stage2,
JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2,
DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
Attaches to this and the two supplied stages a hash-joining stage and
returns it.
|
GeneralHashJoinBuilder<T> |
hashJoinBuilder()
Returns a fluent API builder object to construct a hash join operation
with any number of contributing stages.
|
<R> GeneralStage<R> |
map(DistributedFunction<? super T,? extends R> mapFn)
Attaches to this stage a mapping stage, one which applies the supplied
function to each input item independently and emits the function's
result as the output item.
|
<C,R> GeneralStage<R> |
mapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends R> mapFn)
Attaches to this stage a mapping stage, one which applies the supplied
function to each input item independently and emits the function's result
as the output item.
|
default GeneralStage<T> |
peek()
Adds a peeking layer to this compute stage which logs its output.
|
default GeneralStage<T> |
peek(DistributedFunction<? super T,? extends CharSequence> toStringFn)
Adds a peeking layer to this compute stage which logs its output.
|
GeneralStage<T> |
peek(DistributedPredicate<? super T> shouldLogFn,
DistributedFunction<? super T,? extends CharSequence> toStringFn)
Attaches a peeking stage which logs this stage's output and passes it
through without transformation.
|
getPipeline, name, setLocalParallelism, setName@Nonnull <R> GeneralStage<R> map(@Nonnull DistributedFunction<? super T,? extends R> mapFn)
null, it emits
nothing. Therefore this stage can be used to implement filtering
semantics as well.R - the result type of the mapping functionmapFn - a stateless mapping function@Nonnull GeneralStage<T> filter(@Nonnull DistributedPredicate<T> filterFn)
filterFn - a stateless filter predicate function@Nonnull <R> GeneralStage<R> flatMap(@Nonnull DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
Traverser it returns. The traverser must be
null-terminated.R - the type of items in the result's traversersflatMapFn - a stateless flatmapping function, whose result type is
Jet's Traverser@Nonnull <C,R> GeneralStage<R> mapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends R> mapFn)
contextFactory.
If the mapping result is null, it emits nothing. Therefore this
stage can be used to implement filtering semantics as well.
NOTE: any state you maintain in the context object does not automatically become a part of a fault-tolerant snapshot. If Jet must restore from a snapshot, your state will either be lost (if it was just local state) or not rewound to the checkpoint (if it was stored in some durable storage).
C - type of context objectR - the result type of the mapping functioncontextFactory - the context factorymapFn - a stateless mapping function@Nonnull <C> GeneralStage<T> filterUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiPredicate<? super C,? super T> filterFn)
contextFactory.
NOTE: any state you maintain in the context object does not automatically become a part of a fault-tolerant snapshot. If Jet must restore from a snapshot, your state will either be lost (if it was just local state) or not rewound to the checkpoint (if it was stored in some durable storage).
C - type of context objectcontextFactory - the context factoryfilterFn - a stateless filter predicate function@Nonnull <C,R> GeneralStage<R> flatMapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
Traverser it returns as the output items. The traverser
must be null-terminated. The mapping function receives another
parameter, the context object which Jet will create using the supplied
contextFactory.
NOTE: any state you maintain in the context object does not automatically become a part of a fault-tolerant snapshot. If Jet must restore from a snapshot, your state will either be lost (if it was just local state) or not rewound to the checkpoint (if it was stored in some durable storage).
C - type of context objectR - the type of items in the result's traverserscontextFactory - the context factoryflatMapFn - a stateless flatmapping function, whose result type is
Jet's Traverser@Nonnull <K,T1_IN,T1,R> GeneralStage<R> hashJoin(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull DistributedBiFunction<T,T1,R> mapToOutputFn)
package Javadoc for a detailed description of the hash-join transform.K - the type of the join keyT1_IN - the type of stage1 itemsT1 - the result type of projection on stage1 itemsR - the resulting output typestage1 - the stage to hash-join with this onejoinClause1 - specifies how to join the two streamsmapToOutputFn - function to map the joined items to the output value@Nonnull <K1,T1_IN,T1,K2,T2_IN,T2,R> GeneralStage<R> hashJoin2(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BatchStage<T2_IN> stage2, @Nonnull JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2, @Nonnull DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
package Javadoc for a detailed description of the hash-join transform.K1 - the type of key for stage1T1_IN - the type of stage1 itemsT1 - the result type of projection of stage1 itemsK2 - the type of key for stage2T2_IN - the type of stage2 itemsT2 - the result type of projection of stage2 itemsR - the resulting output typestage1 - the first stage to joinjoinClause1 - specifies how to join with stage1stage2 - the second stage to joinjoinClause2 - specifices how to join with stage2mapToOutputFn - function to map the joined items to the output value@Nonnull GeneralHashJoinBuilder<T> hashJoinBuilder()
stage.hashJoinN(...) calls because they offer
more static type safety.@Nonnull <K> GeneralStageWithGrouping<T,K> groupingKey(@Nonnull DistributedFunction<? super T,? extends K> keyFn)
K - type of the keykeyFn - function that extracts the grouping key@Nonnull default StreamStage<T> addTimestamps()
addTimestamps(timestampFn, allowedLag to extract them.IllegalArgumentException - if this stage already has timestamps@Nonnull StreamStage<T> addTimestamps(DistributedToLongFunction<? super T> timestampFn, long allowedLag)
allowedLag parameter controls by how much
the timestamp can be lower than the highest one observed so far. If
it is even lower, Jet will drop the item as being "too late".
For example, if the sequence of the timestamps is [1,4,3,2] and
you configured the allowed lag as 1, Jet will let through the
event with timestamp 3, but it will drop the last one (timestamp
2).
The amount of lag you configure strongly influences the latency of Jet's output. Jet cannot finalize the window until it knows it has observed all the events belonging to it, and the more lag it must tolerate, the longer will it have to wait for possible latecomers. On the other hand, if you don't allow enough lag, you face the risk of failing to account for the data that came in after the results were already emitted.
You should strongly prefer adding this stage right after the source. In that case Jet can compute the watermark inside the source connector, taking into account its partitioning. It can maintain a separate watermark value for each partition and coalesce them into the overall watermark without causing dropped events. If you add the timestamps later on, events from different partitions may be mixed, increasing the perceived event lag and causing more dropped events.
timestampFn - a function that returns the timestamp for each itemallowedLag - the allowed lag behind the top observed timestampIllegalArgumentException - if this stage already has timestamps@Nonnull SinkStage drainTo(@Nonnull Sink<? super T> sink)
@Nonnull GeneralStage<T> peek(@Nonnull DistributedPredicate<? super T> shouldLogFn, @Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
shouldLogFn predicate to see whether to log the item
toStringFn to get the item's string
representation
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
shouldLogFn - a function to filter the logged items. You can use alwaysTrue() as a pass-through filter when you don't need any
filtering.toStringFn - a function that returns a string representation of the itempeek(DistributedFunction),
peek()@Nonnull default GeneralStage<T> peek(@Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
toStringFn to get a string representation of the item
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
toStringFn - a function that returns a string representation of the itempeek(DistributedPredicate, DistributedFunction),
peek()@Nonnull default GeneralStage<T> peek()
toString()
method at the INFO level to the log category com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>.
The stage logs each item on whichever cluster member it happens to
receive it. Its primary purpose is for development use, when running Jet
on a local machine.@Nonnull <R> GeneralStage<R> customTransform(@Nonnull String stageName, @Nonnull DistributedSupplier<Processor> procSupplier)
Processors. To be compatible with
the rest of the pipeline, the processor must expect a single inbound
edge and arbitrarily many outbound edges, and it must push the same data
to all outbound edges.
Note that the returned stage's type parameter is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
R - the type of the output itemsstageName - a human-readable name for the custom stageprocSupplier - the supplier of processorsCopyright © 2018 Hazelcast, Inc.. All rights reserved.