Class KafkaSinks


  • public final class KafkaSinks
    extends java.lang.Object
    Contains factory methods for Apache Kafka sinks.
    Since:
    Jet 3.0
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  KafkaSinks.Builder<E>
      A builder for Kafka sink.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <E> KafkaSinks.Builder<E> kafka​(DataConnectionRef dataConnectionRef)
      Returns a builder object that you can use to create an Apache Kafka pipeline sink.
      static <E,​K,​V>
      Sink<E>
      kafka​(DataConnectionRef dataConnectionRef, FunctionEx<? super E,​org.apache.kafka.clients.producer.ProducerRecord<K,​V>> toRecordFn)
      Returns a sink that publishes messages to Apache Kafka topic(s).
      static <K,​V>
      Sink<java.util.Map.Entry<K,​V>>
      kafka​(DataConnectionRef dataConnectionRef, java.lang.String topic)
      Convenience for kafka(DataConnectionRef, String, FunctionEx, FunctionEx) which expects Map.Entry<K, V> as input and extracts its key and value parts to be published to Kafka.
      static <E,​K,​V>
      Sink<E>
      kafka​(DataConnectionRef dataConnectionRef, java.lang.String topic, FunctionEx<? super E,​K> extractKeyFn, FunctionEx<? super E,​V> extractValueFn)
      Convenience for kafka(Properties, FunctionEx) which creates a ProducerRecord using the given topic and the given key and value mapping functions
      static <E,​K,​V>
      Sink<E>
      kafka​(DataConnectionRef dataConnectionRef, java.util.Properties properties, java.lang.String topic, FunctionEx<? super E,​K> extractKeyFn, FunctionEx<? super E,​V> extractValueFn)
      Convenience for kafka(Properties, FunctionEx) which creates a ProducerRecord using the given topic and the given key and value mapping functions with additional properties available
      static <E> KafkaSinks.Builder<E> kafka​(java.util.Properties properties)
      Returns a builder object that you can use to create an Apache Kafka pipeline sink.
      static <E,​K,​V>
      Sink<E>
      kafka​(java.util.Properties properties, FunctionEx<? super E,​org.apache.kafka.clients.producer.ProducerRecord<K,​V>> toRecordFn)
      Returns a sink that publishes messages to Apache Kafka topic(s).
      static <K,​V>
      Sink<java.util.Map.Entry<K,​V>>
      kafka​(java.util.Properties properties, java.lang.String topic)
      Convenience for kafka(Properties, String, FunctionEx, FunctionEx) which expects Map.Entry<K, V> as input and extracts its key and value parts to be published to Kafka.
      static <E,​K,​V>
      Sink<E>
      kafka​(java.util.Properties properties, java.lang.String topic, FunctionEx<? super E,​K> extractKeyFn, FunctionEx<? super E,​V> extractValueFn)
      Convenience for kafka(Properties, FunctionEx) which creates a ProducerRecord using the given topic and the given key and value mapping functions
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • kafka

        @Nonnull
        public static <E,​K,​V> Sink<E> kafka​(@Nonnull
                                                        java.util.Properties properties,
                                                        @Nonnull
                                                        FunctionEx<? super E,​org.apache.kafka.clients.producer.ProducerRecord<K,​V>> toRecordFn)
        Returns a sink that publishes messages to Apache Kafka topic(s). It transforms each received item to a ProducerRecord using the supplied mapping function.

        The sink creates a single KafkaProducer per processor using the supplied properties.

        The behavior depends on the job's processing guarantee:

        • EXACTLY_ONCE: the sink will use Kafka transactions to commit the messages. Transactions are committed after a snapshot is completed. This increases the latency of the messages because they are only visible to consumers after they are committed and slightly reduces the throughput because no messages are sent between the snapshot phases.

          When using transactions pay attention to your transaction.timeout.ms config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute.

          Also keep in mind the consumers need to use isolation.level=read_committed, which is not the default. Otherwise the consumers will see duplicate messages.

        • AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
        If you want to avoid the overhead of transactions, you can reduce the guarantee just for the sink. To do so, use the builder version and call exactlyOnce(false) on the builder.

        IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.

        The default local parallelism for this processor is 1.

        Type Parameters:
        E - type of stream item
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        properties - producer properties which should contain broker address and key/value serializers
        toRecordFn - function that extracts the key from the stream item
      • kafka

        @Beta
        @Nonnull
        public static <E,​K,​V> Sink<E> kafka​(@Nonnull
                                                        DataConnectionRef dataConnectionRef,
                                                        @Nonnull
                                                        FunctionEx<? super E,​org.apache.kafka.clients.producer.ProducerRecord<K,​V>> toRecordFn)
        Returns a sink that publishes messages to Apache Kafka topic(s). It transforms each received item to a ProducerRecord using the supplied mapping function.

        The sink uses the supplied DataConnection to obtain a KafkaProducer instance for each Processor. Depending on the DataConnection configuration it may be either a new instance for each processor, or a shared instance. NOTE: Shared instance can't be used with exactly-once.

        The behavior depends on the job's processing guarantee:

        • EXACTLY_ONCE: the sink will use Kafka transactions to commit the messages. Transactions are committed after a snapshot is completed. This increases the latency of the messages because they are only visible to consumers after they are committed and slightly reduces the throughput because no messages are sent between the snapshot phases.

          When using transactions pay attention to your transaction.timeout.ms config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute.

          Also keep in mind the consumers need to use isolation.level=read_committed, which is not the default. Otherwise the consumers will see duplicate messages.

        • AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.

        If you want to avoid the overhead of transactions, you can reduce the guarantee just for the sink. To do so, use the builder version and call exactlyOnce(false) on the builder.

        IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.

        The default local parallelism for this processor is 1.

        Type Parameters:
        E - type of stream item
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        dataConnectionRef - DataConnection reference to use to obtain the KafkaProducer
        toRecordFn - function that extracts the key from the stream item
        Since:
        5.3
      • kafka

        @Nonnull
        public static <E,​K,​V> Sink<E> kafka​(@Nonnull
                                                        java.util.Properties properties,
                                                        @Nonnull
                                                        java.lang.String topic,
                                                        @Nonnull
                                                        FunctionEx<? super E,​K> extractKeyFn,
                                                        @Nonnull
                                                        FunctionEx<? super E,​V> extractValueFn)
        Convenience for kafka(Properties, FunctionEx) which creates a ProducerRecord using the given topic and the given key and value mapping functions
        Type Parameters:
        E - type of stream item
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        properties - producer properties which should contain broker address and key/value serializers
        topic - name of the Kafka topic to publish to
        extractKeyFn - function that extracts the key from the stream item
        extractValueFn - function that extracts the value from the stream item
      • kafka

        @Beta
        @Nonnull
        public static <E,​K,​V> Sink<E> kafka​(@Nonnull
                                                        DataConnectionRef dataConnectionRef,
                                                        @Nonnull
                                                        java.lang.String topic,
                                                        @Nonnull
                                                        FunctionEx<? super E,​K> extractKeyFn,
                                                        @Nonnull
                                                        FunctionEx<? super E,​V> extractValueFn)
        Convenience for kafka(Properties, FunctionEx) which creates a ProducerRecord using the given topic and the given key and value mapping functions
        Type Parameters:
        E - type of stream item
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        dataConnectionRef - producer properties which should contain broker address and key/value serializers
        topic - name of the Kafka topic to publish to
        extractKeyFn - function that extracts the key from the stream item
        extractValueFn - function that extracts the value from the stream item
        Since:
        5.3
      • kafka

        @Beta
        @Nonnull
        public static <E,​K,​V> Sink<E> kafka​(@Nonnull
                                                        DataConnectionRef dataConnectionRef,
                                                        @Nonnull
                                                        java.util.Properties properties,
                                                        @Nonnull
                                                        java.lang.String topic,
                                                        @Nonnull
                                                        FunctionEx<? super E,​K> extractKeyFn,
                                                        @Nonnull
                                                        FunctionEx<? super E,​V> extractValueFn)
        Convenience for kafka(Properties, FunctionEx) which creates a ProducerRecord using the given topic and the given key and value mapping functions with additional properties available
        Type Parameters:
        E - type of stream item
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        dataConnectionRef - producer properties which should contain broker address and key/value serializers
        properties - additional properties
        topic - name of the Kafka topic to publish to
        extractKeyFn - function that extracts the key from the stream item
        extractValueFn - function that extracts the value from the stream item
        Since:
        5.3
      • kafka

        @Nonnull
        public static <K,​V> Sink<java.util.Map.Entry<K,​V>> kafka​(@Nonnull
                                                                             java.util.Properties properties,
                                                                             @Nonnull
                                                                             java.lang.String topic)
        Convenience for kafka(Properties, String, FunctionEx, FunctionEx) which expects Map.Entry<K, V> as input and extracts its key and value parts to be published to Kafka.
        Type Parameters:
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        properties - producer properties which should contain broker address and key/value serializers
        topic - Kafka topic name to publish to
      • kafka

        @Beta
        @Nonnull
        public static <K,​V> Sink<java.util.Map.Entry<K,​V>> kafka​(@Nonnull
                                                                             DataConnectionRef dataConnectionRef,
                                                                             @Nonnull
                                                                             java.lang.String topic)
        Convenience for kafka(DataConnectionRef, String, FunctionEx, FunctionEx) which expects Map.Entry<K, V> as input and extracts its key and value parts to be published to Kafka.
        Type Parameters:
        K - type of the key published to Kafka
        V - type of the value published to Kafka
        Parameters:
        dataConnectionRef - DataConnection reference to use to obtain the KafkaProducer
        topic - Kafka topic name to publish to
        Since:
        5.3
      • kafka

        @Nonnull
        public static <E> KafkaSinks.Builder<E> kafka​(@Nonnull
                                                      java.util.Properties properties)
        Returns a builder object that you can use to create an Apache Kafka pipeline sink.

        The sink creates a single KafkaProducer per processor using the supplied properties.

        The behavior depends on the job's processing guarantee:

        • EXACTLY_ONCE: the sink will use Kafka transactions to commit the messages. This brings some overhead on the broker side, slight throughput reduction (we don't send messages between snapshot phases) and, most importantly, increases the latency of the messages because they are only visible to consumers after they are committed.

          When using transactions pay attention to your transaction.timeout.ms config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute.

        • AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
        If you want to avoid the overhead of transactions, you can reduce the guarantee just for the sink by calling exactlyOnce(false) on the returned builder.

        IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.

        Default local parallelism for this processor is 1.

        Type Parameters:
        E - type of stream item
        Parameters:
        properties - producer properties which should contain broker address and key/value serializers
      • kafka

        @Beta
        @Nonnull
        public static <E> KafkaSinks.Builder<E> kafka​(@Nonnull
                                                      DataConnectionRef dataConnectionRef)
        Returns a builder object that you can use to create an Apache Kafka pipeline sink.

        The sink creates a single KafkaProducer per processor using the supplied properties.

        The behavior depends on the job's processing guarantee:

        • EXACTLY_ONCE: the sink will use Kafka transactions to commit the messages. This brings some overhead on the broker side, slight throughput reduction (we don't send messages between snapshot phases) and, most importantly, increases the latency of the messages because they are only visible to consumers after they are committed.

          When using transactions pay attention to your transaction.timeout.ms config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute.

        • AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
        If you want to avoid the overhead of transactions, you can reduce the guarantee just for the sink by calling exactlyOnce(false) on the returned builder.

        IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.

        Default local parallelism for this processor is 1.

        Type Parameters:
        E - type of stream item
        Parameters:
        dataConnectionRef - DataConnection reference to use to obtain the KafkaProducer
        Since:
        5.3