Class KafkaSinks
- Since:
- Jet 3.0
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
A builder for Kafka sink. -
Method Summary
Modifier and TypeMethodDescriptionstatic <E> KafkaSinks.Builder<E>
kafka
(DataConnectionRef dataConnectionRef) Returns a builder object that you can use to create an Apache Kafka pipeline sink.static <E,
K, V> Sink<E> kafka
(DataConnectionRef dataConnectionRef, FunctionEx<? super E, org.apache.kafka.clients.producer.ProducerRecord<K, V>> toRecordFn) Returns a sink that publishes messages to Apache Kafka topic(s).kafka
(DataConnectionRef dataConnectionRef, String topic) Convenience forkafka(DataConnectionRef, String, FunctionEx, FunctionEx)
which expectsMap.Entry<K, V>
as input and extracts its key and value parts to be published to Kafka.static <E,
K, V> Sink<E> kafka
(DataConnectionRef dataConnectionRef, String topic, FunctionEx<? super E, K> extractKeyFn, FunctionEx<? super E, V> extractValueFn) Convenience forkafka(Properties, FunctionEx)
which creates aProducerRecord
using the given topic and the given key and value mapping functionsstatic <E,
K, V> Sink<E> kafka
(DataConnectionRef dataConnectionRef, Properties properties, String topic, FunctionEx<? super E, K> extractKeyFn, FunctionEx<? super E, V> extractValueFn) Convenience forkafka(Properties, FunctionEx)
which creates aProducerRecord
using the given topic and the given key and value mapping functions with additional properties availablestatic <E> KafkaSinks.Builder<E>
kafka
(Properties properties) Returns a builder object that you can use to create an Apache Kafka pipeline sink.static <E,
K, V> Sink<E> kafka
(Properties properties, FunctionEx<? super E, org.apache.kafka.clients.producer.ProducerRecord<K, V>> toRecordFn) Returns a sink that publishes messages to Apache Kafka topic(s).kafka
(Properties properties, String topic) Convenience forkafka(Properties, String, FunctionEx, FunctionEx)
which expectsMap.Entry<K, V>
as input and extracts its key and value parts to be published to Kafka.static <E,
K, V> Sink<E> kafka
(Properties properties, String topic, FunctionEx<? super E, K> extractKeyFn, FunctionEx<? super E, V> extractValueFn) Convenience forkafka(Properties, FunctionEx)
which creates aProducerRecord
using the given topic and the given key and value mapping functions
-
Method Details
-
kafka
@Nonnull public static <E,K, Sink<E> kafkaV> (@Nonnull Properties properties, @Nonnull FunctionEx<? super E, org.apache.kafka.clients.producer.ProducerRecord<K, V>> toRecordFn) Returns a sink that publishes messages to Apache Kafka topic(s). It transforms each received item to aProducerRecord
using the supplied mapping function.The sink creates a single
KafkaProducer
per processor using the suppliedproperties
.The behavior depends on the job's processing guarantee:
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. Transactions are committed after a snapshot is
completed. This increases the latency of the messages because they
are only visible to consumers after they are committed and slightly
reduces the throughput because no messages are sent between the
snapshot phases.
When using transactions pay attention to your
transaction.timeout.ms
config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute.Also keep in mind the consumers need to use
isolation.level=read_committed
, which is not the default. Otherwise the consumers will see duplicate messages. - AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
exactlyOnce(false)
on the builder.IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.
The default local parallelism for this processor is 1.
- Type Parameters:
E
- type of stream itemK
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
properties
- producer properties which should contain broker address and key/value serializerstoRecordFn
- function that extracts the key from the stream item
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. Transactions are committed after a snapshot is
completed. This increases the latency of the messages because they
are only visible to consumers after they are committed and slightly
reduces the throughput because no messages are sent between the
snapshot phases.
-
kafka
@Beta @Nonnull public static <E,K, Sink<E> kafkaV> (@Nonnull DataConnectionRef dataConnectionRef, @Nonnull FunctionEx<? super E, org.apache.kafka.clients.producer.ProducerRecord<K, V>> toRecordFn) Returns a sink that publishes messages to Apache Kafka topic(s). It transforms each received item to aProducerRecord
using the supplied mapping function.The sink uses the supplied DataConnection to obtain a
KafkaProducer
instance for eachProcessor
. Depending on the DataConnection configuration it may be either a new instance for each processor, or a shared instance. NOTE: Shared instance can't be used with exactly-once.The behavior depends on the job's processing guarantee:
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. Transactions are committed after a snapshot is
completed. This increases the latency of the messages because they
are only visible to consumers after they are committed and slightly
reduces the throughput because no messages are sent between the
snapshot phases.
When using transactions pay attention to your
transaction.timeout.ms
config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute.Also keep in mind the consumers need to use
isolation.level=read_committed
, which is not the default. Otherwise the consumers will see duplicate messages. - AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
If you want to avoid the overhead of transactions, you can reduce the guarantee just for the sink. To do so, use the builder version and call
exactlyOnce(false)
on the builder.IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.
The default local parallelism for this processor is 1.
- Type Parameters:
E
- type of stream itemK
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
dataConnectionRef
- DataConnection reference to use to obtain the KafkaProducertoRecordFn
- function that extracts the key from the stream item- Since:
- 5.3
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. Transactions are committed after a snapshot is
completed. This increases the latency of the messages because they
are only visible to consumers after they are committed and slightly
reduces the throughput because no messages are sent between the
snapshot phases.
-
kafka
@Nonnull public static <E,K, Sink<E> kafkaV> (@Nonnull Properties properties, @Nonnull String topic, @Nonnull FunctionEx<? super E, K> extractKeyFn, @Nonnull FunctionEx<? super E, V> extractValueFn) Convenience forkafka(Properties, FunctionEx)
which creates aProducerRecord
using the given topic and the given key and value mapping functions- Type Parameters:
E
- type of stream itemK
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
properties
- producer properties which should contain broker address and key/value serializerstopic
- name of the Kafka topic to publish toextractKeyFn
- function that extracts the key from the stream itemextractValueFn
- function that extracts the value from the stream item
-
kafka
@Beta @Nonnull public static <E,K, Sink<E> kafkaV> (@Nonnull DataConnectionRef dataConnectionRef, @Nonnull String topic, @Nonnull FunctionEx<? super E, K> extractKeyFn, @Nonnull FunctionEx<? super E, V> extractValueFn) Convenience forkafka(Properties, FunctionEx)
which creates aProducerRecord
using the given topic and the given key and value mapping functions- Type Parameters:
E
- type of stream itemK
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
dataConnectionRef
- producer properties which should contain broker address and key/value serializerstopic
- name of the Kafka topic to publish toextractKeyFn
- function that extracts the key from the stream itemextractValueFn
- function that extracts the value from the stream item- Since:
- 5.3
-
kafka
@Beta @Nonnull public static <E,K, Sink<E> kafkaV> (@Nonnull DataConnectionRef dataConnectionRef, @Nonnull Properties properties, @Nonnull String topic, @Nonnull FunctionEx<? super E, K> extractKeyFn, @Nonnull FunctionEx<? super E, V> extractValueFn) Convenience forkafka(Properties, FunctionEx)
which creates aProducerRecord
using the given topic and the given key and value mapping functions with additional properties available- Type Parameters:
E
- type of stream itemK
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
dataConnectionRef
- producer properties which should contain broker address and key/value serializersproperties
- additional propertiestopic
- name of the Kafka topic to publish toextractKeyFn
- function that extracts the key from the stream itemextractValueFn
- function that extracts the value from the stream item- Since:
- 5.3
-
kafka
@Nonnull public static <K,V> Sink<Map.Entry<K,V>> kafka(@Nonnull Properties properties, @Nonnull String topic) Convenience forkafka(Properties, String, FunctionEx, FunctionEx)
which expectsMap.Entry<K, V>
as input and extracts its key and value parts to be published to Kafka.- Type Parameters:
K
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
properties
- producer properties which should contain broker address and key/value serializerstopic
- Kafka topic name to publish to
-
kafka
@Beta @Nonnull public static <K,V> Sink<Map.Entry<K,V>> kafka(@Nonnull DataConnectionRef dataConnectionRef, @Nonnull String topic) Convenience forkafka(DataConnectionRef, String, FunctionEx, FunctionEx)
which expectsMap.Entry<K, V>
as input and extracts its key and value parts to be published to Kafka.- Type Parameters:
K
- type of the key published to KafkaV
- type of the value published to Kafka- Parameters:
dataConnectionRef
- DataConnection reference to use to obtain the KafkaProducertopic
- Kafka topic name to publish to- Since:
- 5.3
-
kafka
Returns a builder object that you can use to create an Apache Kafka pipeline sink.The sink creates a single
KafkaProducer
per processor using the suppliedproperties
.The behavior depends on the job's processing guarantee:
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. This brings some overhead on the broker side,
slight throughput reduction (we don't send messages between snapshot
phases) and, most importantly, increases the latency of the messages
because they are only visible to consumers after they are committed.
When using transactions pay attention to your
transaction.timeout.ms
config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute. - AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
exactlyOnce(false)
on the returned builder.IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.
Default local parallelism for this processor is 1.
- Type Parameters:
E
- type of stream item- Parameters:
properties
- producer properties which should contain broker address and key/value serializers
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. This brings some overhead on the broker side,
slight throughput reduction (we don't send messages between snapshot
phases) and, most importantly, increases the latency of the messages
because they are only visible to consumers after they are committed.
-
kafka
@Beta @Nonnull public static <E> KafkaSinks.Builder<E> kafka(@Nonnull DataConnectionRef dataConnectionRef) Returns a builder object that you can use to create an Apache Kafka pipeline sink.The sink creates a single
KafkaProducer
per processor using the suppliedproperties
.The behavior depends on the job's processing guarantee:
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. This brings some overhead on the broker side,
slight throughput reduction (we don't send messages between snapshot
phases) and, most importantly, increases the latency of the messages
because they are only visible to consumers after they are committed.
When using transactions pay attention to your
transaction.timeout.ms
config property. It limits the entire duration of the transaction since it is begun, not just inactivity timeout. It must not be smaller than your snapshot interval, otherwise the Kafka broker will roll the transaction back before Jet is done with it. Also it should be large enough so that Jet has time to restart after a failure: a member can crash just before it's about to commit, and Jet will attempt to commit the transaction after the restart, but the transaction must be still waiting in the broker. The default in Kafka 2.4 is 1 minute. - AT_LEAST_ONCE: messages are committed immediately, the sink ensure that all async operations are done at 1st snapshot phase. This ensures that each message is written if the job fails, but might be written again after the job restarts.
exactlyOnce(false)
on the returned builder.IO failures are generally handled by Kafka producer and do not cause the processor to fail. Refer to Kafka documentation for details.
Default local parallelism for this processor is 1.
- Type Parameters:
E
- type of stream item- Parameters:
dataConnectionRef
- DataConnection reference to use to obtain the KafkaProducer- Since:
- 5.3
- EXACTLY_ONCE: the sink will use Kafka transactions to
commit the messages. This brings some overhead on the broker side,
slight throughput reduction (we don't send messages between snapshot
phases) and, most importantly, increases the latency of the messages
because they are only visible to consumers after they are committed.
-