Class KinesisSources


  • public final class KinesisSources
    extends java.lang.Object
    Contains factory methods for creating Amazon Kinesis Data Streams (KDS) sources.
    Since:
    Jet 4.4
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  KinesisSources.Builder<T>
      Fluent builder for constructing the Kinesis source and setting its configuration parameters.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String MILLIS_BEHIND_LATEST_METRIC
      Name of the metric exposed by the source, used to monitor if reading a specific shard is delayed or not.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static KinesisSources.Builder<java.util.Map.Entry<java.lang.String,​byte[]>> kinesis​(java.lang.String stream)
      Initiates the building of a streaming source that consumes a Kinesis data stream and emits Map.Entry<String, byte[]> items.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • MILLIS_BEHIND_LATEST_METRIC

        public static final java.lang.String MILLIS_BEHIND_LATEST_METRIC
        Name of the metric exposed by the source, used to monitor if reading a specific shard is delayed or not. The value represents the number of milliseconds the last read record is behind the tip of the stream.
        See Also:
        Constant Field Values
    • Method Detail

      • kinesis

        @Nonnull
        public static KinesisSources.Builder<java.util.Map.Entry<java.lang.String,​byte[]>> kinesis​(@Nonnull
                                                                                                         java.lang.String stream)
        Initiates the building of a streaming source that consumes a Kinesis data stream and emits Map.Entry<String, byte[]> items.

        Each emitted item represents a Kinesis record, the basic unit of data stored in KDS. A record is composed of a sequence number, partition key, and data blob. This source does not expose the sequence numbers, uses them only internally, for replay purposes.

        The partition key is used to group related records and route them inside Kinesis. Partition keys are Unicode strings with a maximum length of 256 characters. They are specified by the data producer while adding data to KDS.

        The data blob of the record is provided in the form of a byte array, in essence, serialized data (Kinesis doesn't handle serialization internally). The maximum size of the data blob is 1 MB.

        The source is distributed. Each instance is consuming data from zero, one or more Kinesis shards, the base throughput units of KDS. One shard provides a capacity of 2MB/sec data output. Items with the same partition key always come from the same shard; one shard contains multiple partition keys. Items coming from the same shard are ordered, and the source preserves this order (except on resharding). The local parallelism of the source is not defined, so it will depend on the number of cores available.

        If a processing guarantee is specified for the job, Jet will periodically save the current shard offsets internally and then replay from the saved offsets when the job is restarted. If no processing guarantee is enabled, the source will start reading from the oldest available data, determined by the KDS retention period (defaults to 24 hours, can be as long as 365 days). This replay capability makes the source suitable for pipelines with both at-least-once and exactly-once processing guarantees.

        The source can provide native timestamps, in the sense that they are read from KDS, but be aware that they are Kinesis ingestion times, not event times in the strictest sense.

        As stated before, the source preserves the ordering inside shards. However, Kinesis supports resharding, which lets you adjust the number of shards in your stream to adapt to changes in data flow rate through the stream. When resharding happens, the source cannot preserve the order among the last items of a shard being destroyed and the first items of new shards being created. Watermarks might also experience unexpected behaviour during resharding. Best to do resharding when the data flow is stopped, if possible.

        NOTE. In Kinesis terms, this source is a "shared throughput consumer." This means that all the limitations on data read, imposed by Kinesis (at most 5 read transaction per second, at most 2MiB of data per second) apply not on a per-source basis but for all sources at once. If you start only one pipeline with a source for a Kinesis stream, then the source implementation will assure that no limit is tripped. But if you run multiple pipelines with sources for the same Kinesis stream, these sources cannot coordinate the limits among themselves. It will still work, but you will see occasional warning messages in the logs about rates and limitations being tripped.

        The source provides a metric called "millisBehindLatest", which specifies, for each shard, the source is actively reading data from if there is any delay in reading the data.

        Parameters:
        stream - name of the Kinesis stream being consumed by the source
        Returns:
        fluent builder that can be used to set properties and also to construct the source once configuration is done