Class AvroSinks


public final class AvroSinks extends Object
Contains factory methods for Apache Avro sinks.
Jet 3.0
  • Method Details

    • files

      @Nonnull public static <R> Sink<R> files(@Nonnull String directoryName, @Nonnull org.apache.avro.Schema schema, @Nonnull SupplierEx<<R>> datumWriterSupplier)
      Returns a sink that that writes the items it receives to Apache Avro files. Each processor will write to its own file whose name is equal to the processor's global index (an integer unique to each processor of the vertex), but a single pathname is used to resolve the containing directory of all files, on all cluster members. The sink always overwrites the files.

      The sink creates a DataFileWriter for each processor using the supplied datumWriterSupplier with the given Schema.

      No state is saved to snapshot for this sink. After the job is restarted, the items will be missing since files will be overwritten.

      The default local parallelism for this sink is 1.

      Type Parameters:
      R - the type of the record
      directoryName - directory to create the files in. Will be created if it doesn't exist. Must be the same on all members.
      schema - the record schema
      datumWriterSupplier - the record writer supplier
    • files

      @Nonnull public static <R> Sink<R> files(@Nonnull String directoryName, @Nonnull Class<R> recordClass, @Nonnull org.apache.avro.Schema schema)
      Convenience for files(String, Schema, SupplierEx) which uses either SpecificDatumWriter or ReflectDatumWriter depending on the supplied recordClass.
    • files

      @Nonnull public static Sink<org.apache.avro.generic.IndexedRecord> files(@Nonnull String directoryName, @Nonnull org.apache.avro.Schema schema)
      Convenience for files(String, Schema, SupplierEx) which uses GenericDatumWriter.