One of the main concerns when writing custom sources is that the source is typically distributed across multiple machines and partitions, and the work needs to be distributed across multiple members and processors.
Jet provides a flexible ProcessorMetaSupplier
and ProcessorSupplier
API which can be used to control how a source is distributed across the
network.
The procedure for generating Processor
instances is as follows:
- The
ProcessorMetaSupplier
for theVertex
is serialized and sent to the coordinating member. - The coordinator calls
ProcessorMetaSupplier.get()
once for each member in the cluster and aProcessorSupplier
is created for each member. - The
ProcessorSupplier
for each member is serialized and sent to that member. - Each member will call their own
ProcessorSupplier
with the correct count parameter, which corresponds to thelocalParallelism
setting of that vertex.