Interface Pipeline

All Superinterfaces:
Serializable

public interface Pipeline extends Serializable
Models a distributed computation job using an analogy with a system of interconnected water pipes. The basic element is a stage which can be attached to one or more other stages. A stage accepts the data coming from its upstream stages, transforms it, and directs the resulting data to its downstream stages.

The Pipeline object is a container of all the stages defined on a pipeline: the source stages obtained directly from it by calling readFrom(BatchSource) as well as all the stages attached (directly or indirectly) to them.

Note that there is no simple one-to-one correspondence between pipeline stages and Core API's DAG vertices. Some stages map to several vertices (e.g., grouping and co-grouping are implemented as a cascade of two vertices) and some stages may be merged with others into a single vertex (e.g., a cascade of map/filter/flatMap stages can be fused into one vertex).

Since:
Jet 3.0
  • Method Summary

    Modifier and Type
    Method
    Description
    static Pipeline
    Creates a new, empty pipeline.
    boolean
    Returns true if there are no stages in the pipeline.
    boolean
    Returns the preserve order property of this pipeline
    <T> BatchStage<T>
    readFrom(BatchSource<? extends T> source)
    Returns a pipeline stage that represents a bounded (batch) data source.
    readFrom(StreamSource<? extends T> source)
    Returns a pipeline stage that represents an unbounded data source (i.e., an event stream).
    setPreserveOrder(boolean value)
    Tells Jet whether or not it is allowed to reorder the events for better performance.
    Deprecated.
    since Jet 4.3, Jet performs this transformation on the server-side.
    Returns a DOT format (graphviz) representation of the Pipeline.
    writeTo(Sink<? super T> sink, GeneralStage<? extends T> stage0, GeneralStage<? extends T> stage1, GeneralStage<? extends T>... moreStages)
    Attaches the supplied sink to two or more pipeline stages.
  • Method Details

    • create

      @Nonnull static Pipeline create()
      Creates a new, empty pipeline.
      Since:
      Jet 3.0
    • isPreserveOrder

      boolean isPreserveOrder()
      Returns the preserve order property of this pipeline
      Since:
      Jet 4.4
    • setPreserveOrder

      @Nonnull Pipeline setPreserveOrder(boolean value)
      Tells Jet whether or not it is allowed to reorder the events for better performance. Enabling this property adds restrictions to the internal data flows that ensure each pipeline stage observes the events in the same order as they were received from the source.

      Keep in mind that there is often no total event order to begin with, for example in partitioned sources. In this case there is still partial order per each partition, and that is the order Jet preserves.

      If you enable this property, you cannot use the rebalance operator without a grouping key, because it explicitly orders Jet to break the event order.

      The default value is false.

      Returns:
      this, for fluent API
      Since:
      Jet 4.4
    • readFrom

      @Nonnull <T> BatchStage<T> readFrom(@Nonnull BatchSource<? extends T> source)
      Returns a pipeline stage that represents a bounded (batch) data source. It has no upstream stages and emits the data (typically coming from an outside source) to its downstream stages.
      Type Parameters:
      T - the type of source data items
      Parameters:
      source - the definition of the source from which the stage reads data
    • readFrom

      @Nonnull <T> StreamSourceStage<T> readFrom(@Nonnull StreamSource<? extends T> source)
      Returns a pipeline stage that represents an unbounded data source (i.e., an event stream). It has no upstream stages and emits the data (typically coming from an outside source) to its downstream stages.
      Type Parameters:
      T - the type of source data items
      Parameters:
      source - the definition of the source from which the stage reads data
    • writeTo

      @Nonnull <T> SinkStage writeTo(@Nonnull Sink<? super T> sink, @Nonnull GeneralStage<? extends T> stage0, @Nonnull GeneralStage<? extends T> stage1, @Nonnull GeneralStage<? extends T>... moreStages)
      Attaches the supplied sink to two or more pipeline stages. Returns the SinkStage representing the sink. You need this method when you want to drain more than one stage to the same sink. In the typical case you'll use GeneralStage.writeTo(Sink) instead.
      Type Parameters:
      T - the type of data being drained to the sink
    • toDag

      @Nonnull @Deprecated DAG toDag()
      Deprecated.
      since Jet 4.3, Jet performs this transformation on the server-side.
      Transforms the pipeline into a Jet DAG, which can be submitted for execution to a Jet instance.
    • toDotString

      @Nonnull String toDotString()
      Returns a DOT format (graphviz) representation of the Pipeline.
    • isEmpty

      boolean isEmpty()
      Returns true if there are no stages in the pipeline.
      Since:
      Jet 4.4