T
- the type of items coming out of this stagepublic interface BatchStage<T> extends GeneralStage<T>
pipeline
that will observe a finite amount of data (a batch). It
accepts input from its upstream stages (if any) and passes its output
to its downstream stages.Modifier and Type | Method and Description |
---|---|
<A,R> BatchStage<R> |
aggregate(AggregateOperation1<? super T,A,? extends R> aggrOp)
Attaches to this stage a stage that performs the given aggregate operation
over all the items it receives.
|
<T1,A,R> BatchStage<R> |
aggregate2(BatchStage<T1> stage1,
AggregateOperation2<? super T,? super T1,A,? extends R> aggrOp)
Attaches to this stage a stage that performs the given aggregate
operation over all the items it receives from both this stage and
stage1 you supply. |
<T1,T2,A,R> |
aggregate3(BatchStage<T1> stage1,
BatchStage<T2> stage2,
AggregateOperation3<? super T,? super T1,? super T2,A,? extends R> aggrOp)
Attaches to this stage a stage that performs the given aggregate
operation over all the items it receives from this stage as well as
stage1 and stage2 you supply. |
default AggregateBuilder<T> |
aggregateBuilder()
Returns a fluent API builder object to construct an aggregating stage
with any number of contributing stages.
|
<R> BatchStage<R> |
customTransform(String stageName,
DistributedSupplier<Processor> procSupplier)
Attaches to this stage a stage with a custom transform based on the
provided supplier of Core API
Processor s. |
BatchStage<T> |
filter(DistributedPredicate<T> filterFn)
Attaches to this stage a filtering stage, one which applies the provided
predicate function to each input item to decide whether to pass the item
to the output or to discard it.
|
<C> BatchStage<T> |
filterUsingContext(ContextFactory<C> contextFactory,
DistributedBiPredicate<? super C,? super T> filterFn)
Attaches to this stage a filtering stage, one which applies the provided
predicate function to each input item to decide whether to pass the item
to the output or to discard it.
|
<R> BatchStage<R> |
flatMap(DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches to this stage a flat-mapping stage, one which applies the
supplied function to each input item independently and emits all the
items from the
Traverser it returns. |
<C,R> BatchStage<R> |
flatMapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
Attaches to this stage a flat-mapping stage, one which applies the
supplied function to each input item independently and emits all items
from the
Traverser it returns as the output items. |
<K> StageWithGrouping<T,K> |
groupingKey(DistributedFunction<? super T,? extends K> keyFn)
Specifes the function that will extract the grouping key from the items
in the associated pipeline stage, as first step in the construction of a
group-and-aggregate stage.
|
<K,T1_IN,T1,R> |
hashJoin(BatchStage<T1_IN> stage1,
JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1,
DistributedBiFunction<T,T1,R> mapToOutputFn)
Attaches to both this and the supplied stage a hash-joining stage and
returns it.
|
<K1,T1_IN,T1,K2,T2_IN,T2,R> |
hashJoin2(BatchStage<T1_IN> stage1,
JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1,
BatchStage<T2_IN> stage2,
JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2,
DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
Attaches to this and the two supplied stages a hash-joining stage and
returns it.
|
default HashJoinBuilder<T> |
hashJoinBuilder()
Returns a fluent API builder object to construct a hash join operation
with any number of contributing stages.
|
<R> BatchStage<R> |
map(DistributedFunction<? super T,? extends R> mapFn)
Attaches to this stage a mapping stage, one which applies the supplied
function to each input item independently and emits the function's
result as the output item.
|
<C,R> BatchStage<R> |
mapUsingContext(ContextFactory<C> contextFactory,
DistributedBiFunction<? super C,? super T,? extends R> mapFn)
Attaches to this stage a mapping stage, one which applies the supplied
function to each input item independently and emits the function's result
as the output item.
|
default BatchStage<T> |
peek()
Adds a peeking layer to this compute stage which logs its output.
|
default BatchStage<T> |
peek(DistributedFunction<? super T,? extends CharSequence> toStringFn)
Adds a peeking layer to this compute stage which logs its output.
|
BatchStage<T> |
peek(DistributedPredicate<? super T> shouldLogFn,
DistributedFunction<? super T,? extends CharSequence> toStringFn)
Attaches a peeking stage which logs this stage's output and passes it
through without transformation.
|
BatchStage<T> |
setLocalParallelism(int localParallelism)
Sets the preferred local parallelism (number of workers per Jet
cluster member) this stage will configure its DAG vertices with.
|
BatchStage<T> |
setName(String name)
Overrides the default name of the stage with the name you choose.
|
addTimestamps, addTimestamps, drainTo
getPipeline, name
@Nonnull <K> StageWithGrouping<T,K> groupingKey(@Nonnull DistributedFunction<? super T,? extends K> keyFn)
GeneralStage
groupingKey
in interface GeneralStage<T>
K
- type of the keykeyFn
- function that extracts the grouping key@Nonnull <R> BatchStage<R> map(@Nonnull DistributedFunction<? super T,? extends R> mapFn)
GeneralStage
null
, it emits
nothing. Therefore this stage can be used to implement filtering
semantics as well.map
in interface GeneralStage<T>
R
- the result type of the mapping functionmapFn
- a stateless mapping function@Nonnull BatchStage<T> filter(@Nonnull DistributedPredicate<T> filterFn)
GeneralStage
filter
in interface GeneralStage<T>
filterFn
- a stateless filter predicate function@Nonnull <R> BatchStage<R> flatMap(@Nonnull DistributedFunction<? super T,? extends Traverser<? extends R>> flatMapFn)
GeneralStage
Traverser
it returns. The traverser must be
null-terminated.flatMap
in interface GeneralStage<T>
R
- the type of items in the result's traversersflatMapFn
- a stateless flatmapping function, whose result type is
Jet's Traverser
@Nonnull <C,R> BatchStage<R> mapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends R> mapFn)
GeneralStage
contextFactory
.
If the mapping result is null
, it emits nothing. Therefore this
stage can be used to implement filtering semantics as well.
NOTE: any state you maintain in the context object does not automatically become a part of a fault-tolerant snapshot. If Jet must restore from a snapshot, your state will either be lost (if it was just local state) or not rewound to the checkpoint (if it was stored in some durable storage).
mapUsingContext
in interface GeneralStage<T>
C
- type of context objectR
- the result type of the mapping functioncontextFactory
- the context factorymapFn
- a stateless mapping function@Nonnull <C> BatchStage<T> filterUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiPredicate<? super C,? super T> filterFn)
GeneralStage
contextFactory
.
NOTE: any state you maintain in the context object does not automatically become a part of a fault-tolerant snapshot. If Jet must restore from a snapshot, your state will either be lost (if it was just local state) or not rewound to the checkpoint (if it was stored in some durable storage).
filterUsingContext
in interface GeneralStage<T>
C
- type of context objectcontextFactory
- the context factoryfilterFn
- a stateless filter predicate function@Nonnull <C,R> BatchStage<R> flatMapUsingContext(@Nonnull ContextFactory<C> contextFactory, @Nonnull DistributedBiFunction<? super C,? super T,? extends Traverser<? extends R>> flatMapFn)
GeneralStage
Traverser
it returns as the output items. The traverser
must be null-terminated. The mapping function receives another
parameter, the context object which Jet will create using the supplied
contextFactory
.
NOTE: any state you maintain in the context object does not automatically become a part of a fault-tolerant snapshot. If Jet must restore from a snapshot, your state will either be lost (if it was just local state) or not rewound to the checkpoint (if it was stored in some durable storage).
flatMapUsingContext
in interface GeneralStage<T>
C
- type of context objectR
- the type of items in the result's traverserscontextFactory
- the context factoryflatMapFn
- a stateless flatmapping function, whose result type is
Jet's Traverser
@Nonnull <K,T1_IN,T1,R> BatchStage<R> hashJoin(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull DistributedBiFunction<T,T1,R> mapToOutputFn)
GeneralStage
package Javadoc
for a detailed description of the hash-join transform.hashJoin
in interface GeneralStage<T>
K
- the type of the join keyT1_IN
- the type of stage1
itemsT1
- the result type of projection on stage1
itemsR
- the resulting output typestage1
- the stage to hash-join with this onejoinClause1
- specifies how to join the two streamsmapToOutputFn
- function to map the joined items to the output value@Nonnull <K1,T1_IN,T1,K2,T2_IN,T2,R> BatchStage<R> hashJoin2(@Nonnull BatchStage<T1_IN> stage1, @Nonnull JoinClause<K1,? super T,? super T1_IN,? extends T1> joinClause1, @Nonnull BatchStage<T2_IN> stage2, @Nonnull JoinClause<K2,? super T,? super T2_IN,? extends T2> joinClause2, @Nonnull DistributedTriFunction<T,T1,T2,R> mapToOutputFn)
GeneralStage
package Javadoc
for a detailed description of the hash-join transform.hashJoin2
in interface GeneralStage<T>
K1
- the type of key for stage1
T1_IN
- the type of stage1
itemsT1
- the result type of projection of stage1
itemsK2
- the type of key for stage2
T2_IN
- the type of stage2
itemsT2
- the result type of projection of stage2
itemsR
- the resulting output typestage1
- the first stage to joinjoinClause1
- specifies how to join with stage1
stage2
- the second stage to joinjoinClause2
- specifices how to join with stage2
mapToOutputFn
- function to map the joined items to the output value@Nonnull default HashJoinBuilder<T> hashJoinBuilder()
GeneralStage
stage.hashJoinN(...)
calls because they offer
more static type safety.hashJoinBuilder
in interface GeneralStage<T>
@Nonnull default BatchStage<T> peek()
GeneralStage
toString()
method at the INFO level to the log category com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
.
The stage logs each item on whichever cluster member it happens to
receive it. Its primary purpose is for development use, when running Jet
on a local machine.peek
in interface GeneralStage<T>
GeneralStage.peek(DistributedPredicate, DistributedFunction)
,
GeneralStage.peek(DistributedFunction)
@Nonnull BatchStage<T> peek(@Nonnull DistributedPredicate<? super T> shouldLogFn, @Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
GeneralStage
shouldLogFn
predicate to see whether to log the item
toStringFn
to get the item's string
representation
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
peek
in interface GeneralStage<T>
shouldLogFn
- a function to filter the logged items. You can use alwaysTrue()
as a pass-through filter when you don't need any
filtering.toStringFn
- a function that returns a string representation of the itemGeneralStage.peek(DistributedFunction)
,
GeneralStage.peek()
@Nonnull default BatchStage<T> peek(@Nonnull DistributedFunction<? super T,? extends CharSequence> toStringFn)
GeneralStage
toStringFn
to get a string representation of the item
com.hazelcast.jet.impl.processor.PeekWrappedP.<vertexName>#<processorIndex>
peek
in interface GeneralStage<T>
toStringFn
- a function that returns a string representation of the itemGeneralStage.peek(DistributedPredicate, DistributedFunction)
,
GeneralStage.peek()
@Nonnull <R> BatchStage<R> customTransform(@Nonnull String stageName, @Nonnull DistributedSupplier<Processor> procSupplier)
GeneralStage
Processor
s. To be compatible with
the rest of the pipeline, the processor must expect a single inbound
edge and arbitrarily many outbound edges, and it must push the same data
to all outbound edges.
Note that the returned stage's type parameter is inferred from the call site and not propagated from the processor that will produce the result, so there is no actual type safety provided.
customTransform
in interface GeneralStage<T>
R
- the type of the output itemsstageName
- a human-readable name for the custom stageprocSupplier
- the supplier of processors@Nonnull <A,R> BatchStage<R> aggregate(@Nonnull AggregateOperation1<? super T,A,? extends R> aggrOp)
A
- the type of the accumulator used by the aggregate operationR
- the type of the resultaggrOp
- the aggregate operation to performAggregateOperations
@Nonnull <T1,A,R> BatchStage<R> aggregate2(@Nonnull BatchStage<T1> stage1, @Nonnull AggregateOperation2<? super T,? super T1,A,? extends R> aggrOp)
stage1
you supply. The aggregate operation must specify a separate
accumulator function for each of the two streams (refer to its Javadoc
for a simple example).
The aggregating stage emits a single item.
T1
- type of items in stage1
A
- type of the accumulator used by the aggregate operationR
- type of the resultaggrOp
- the aggregate operation to performAggregateOperations
@Nonnull <T1,T2,A,R> BatchStage<R> aggregate3(@Nonnull BatchStage<T1> stage1, @Nonnull BatchStage<T2> stage2, @Nonnull AggregateOperation3<? super T,? super T1,? super T2,A,? extends R> aggrOp)
stage1
and stage2
you supply. The aggregate operation
must specify a separate accumulator function for each of the three
streams (refer to its Javadoc
for a simple
example).
The aggregating stage emits a single item.
T1
- type of items in stage1
T2
- type of items in stage2
A
- type of the accumulator used by the aggregate operationR
- type of the resultaggrOp
- the aggregate operation to performAggregateOperations
@Nonnull default AggregateBuilder<T> aggregateBuilder()
stage.aggregateN(...)
calls because they offer more
static type safety.@Nonnull BatchStage<T> setLocalParallelism(int localParallelism)
Stage
Note that, while most stages are backed by 1 vertex, there are exceptions. If a stage uses two vertices, each of them will have the given local parallelism, so in total there will be twice as many processing units per member.
The default value is -1 and it signals to Jet to figure out a default value. Jet will determine the vertex's local parallelism during job initialization from the global default and the processor meta-supplier's preferred value.
setLocalParallelism
in interface Stage
Copyright © 2018 Hazelcast, Inc.. All rights reserved.