Class AvroSinks


  • public final class AvroSinks
    extends java.lang.Object
    Contains factory methods for Apache Avro sinks.
    Since:
    Jet 3.0
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <R> Sink<R> files​(java.lang.String directoryName, java.lang.Class<R> recordClass, org.apache.avro.Schema schema)
      Convenience for files(String, Schema, SupplierEx) which uses either SpecificDatumWriter or ReflectDatumWriter depending on the supplied recordClass.
      static Sink<org.apache.avro.generic.IndexedRecord> files​(java.lang.String directoryName, org.apache.avro.Schema schema)
      Convenience for files(String, Schema, SupplierEx) which uses GenericDatumWriter.
      static <R> Sink<R> files​(java.lang.String directoryName, org.apache.avro.Schema schema, SupplierEx<org.apache.avro.io.DatumWriter<R>> datumWriterSupplier)
      Returns a sink that that writes the items it receives to Apache Avro files.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • files

        @Nonnull
        public static <R> Sink<R> files​(@Nonnull
                                        java.lang.String directoryName,
                                        @Nonnull
                                        org.apache.avro.Schema schema,
                                        @Nonnull
                                        SupplierEx<org.apache.avro.io.DatumWriter<R>> datumWriterSupplier)
        Returns a sink that that writes the items it receives to Apache Avro files. Each processor will write to its own file whose name is equal to the processor's global index (an integer unique to each processor of the vertex), but a single pathname is used to resolve the containing directory of all files, on all cluster members. The sink always overwrites the files.

        The sink creates a DataFileWriter for each processor using the supplied datumWriterSupplier with the given Schema.

        No state is saved to snapshot for this sink. After the job is restarted, the items will be missing since files will be overwritten.

        The default local parallelism for this sink is 1.

        Type Parameters:
        R - the type of the record
        Parameters:
        directoryName - directory to create the files in. Will be created if it doesn't exist. Must be the same on all members.
        schema - the record schema
        datumWriterSupplier - the record writer supplier
      • files

        @Nonnull
        public static <R> Sink<R> files​(@Nonnull
                                        java.lang.String directoryName,
                                        @Nonnull
                                        java.lang.Class<R> recordClass,
                                        @Nonnull
                                        org.apache.avro.Schema schema)
        Convenience for files(String, Schema, SupplierEx) which uses either SpecificDatumWriter or ReflectDatumWriter depending on the supplied recordClass.
      • files

        @Nonnull
        public static Sink<org.apache.avro.generic.IndexedRecord> files​(@Nonnull
                                                                        java.lang.String directoryName,
                                                                        @Nonnull
                                                                        org.apache.avro.Schema schema)
        Convenience for files(String, Schema, SupplierEx) which uses GenericDatumWriter.