Class AvroSourceBuilder<D>

Type Parameters:
D - the type of the datum read by datumReaderSupplier

public final class AvroSourceBuilder<D> extends Object
Builder for an Avro file source that reads records from Avro files in a directory (but not its subdirectories) and emits objects returned by mapOutputFn.
Jet 3.0
  • Method Details

    • glob

      public AvroSourceBuilder<D> glob(@Nonnull String glob)
      Sets the globbing mask, see getPathMatcher(). The default value is "*", which means all files.
    • sharedFileSystem

      public AvroSourceBuilder<D> sharedFileSystem(boolean sharedFileSystem)
      Sets whether files are in a shared storage visible to all members. The default value is false.

      If sharedFileSystem is true, Jet will assume all members see the same files. They will split the work so that each member will read a part of the files. If sharedFileSystem is false, each member will read all files in the directory, assuming they are local.

    • build

      public <T> BatchSource<T> build(@Nonnull BiFunctionEx<String,? super D,T> mapOutputFn)
      Builds a custom Avro file BatchSource with supplied components and the output function mapOutputFn.

      The source does not save any state to the snapshot. If the job is restarted, it will re-emit all entries.

      Any IOException will cause the job to fail. The files must not change while being read; if they do, the behavior is unspecified.

      The default local parallelism for this processor is 4 (or available CPU count if it is less than 4).

      Type Parameters:
      T - the type of the items the source emits
      mapOutputFn - the function which creates output object from each record. Gets the filename and record read by datumReader as parameters
    • build

      public BatchSource<D> build()
      Convenience for build(BiFunctionEx). Builds a source that emits the records as read by datumReader, without any transformation.