T
- the type of items coming out of this stagepublic interface StreamSourceStage<T>
pipeline
that will observe an unbounded amount of data (i.e., an event stream).
Timestamp handling, a prerequisite for attaching data processing stages,
is not yet defined in this step. Call one of the methods on this
instance to declare whether and how the data source will assign
timestamps to events.Modifier and Type | Method and Description |
---|---|
StreamStage<T> |
withIngestionTimestamps()
Declares that the source will assign the time of ingestion as the event
timestamp.
|
StreamStage<T> |
withNativeTimestamps(long allowedLag)
Declares that the stream will use the source's native timestamps.
|
StreamStage<T> |
withoutTimestamps()
Declares that the source will not assign any timestamp to the events it
emits.
|
StreamStage<T> |
withTimestamps(ToLongFunctionEx<? super T> timestampFn,
long allowedLag)
Declares that the source will extract timestamps from the stream items.
|
StreamStage<T> withoutTimestamps()
addTimestamps
,
but the behavior is different — see the note there.StreamStage<T> withIngestionTimestamps()
System.currentTimeMillis()
at the moment
it observes an event from the data source and assign it as the event
timestamp. The actual time of the original event is ignored.
With this mode, unlike withTimestamps(com.hazelcast.function.ToLongFunctionEx<? super T>, long)
or withNativeTimestamps(long)
, the sparse events issue isn't present.
You can use this mode to avoid the issue, but there's a caveat:
Note: when snapshotting is enabled to achieve fault
tolerance, after a restart Jet replays all the events that were already
processed since the last snapshot. These events will then get different
timestamps. If you want your job to be fault-tolerant, the events in the
stream must have a stable timestamp associated with them. The source may
natively provide such timestamps (the withNativeTimestamps(long)
option). If that is not appropriate, the events should carry their own
timestamp as a part of their data and you can use withTimestamps(timestampFn, allowedLag
to extract it.
Note 2: if the system time goes back (such as when adjusting it), newer events will get older timestamps and might be dropped as late, because the allowed lag is 0.
StreamStage<T> withNativeTimestamps(long allowedLag)
If there's no notion of native timestamps in the source, this method
will throw a JetException
.
Issue with sparse events
Event time progresses only through the ingestion of new events. If the events are sparse, time will effectively stop until a newer event arrives. This causes high latency for time-sensitive operations (such as window aggregation). In addition, Jet tracks event time for every source partition separately, and if just one partition has sparse events, time progress in the whole job is hindered.
To overcome this you can either ensure there's a consistent influx of
events in every partition, or you can use withIngestionTimestamps()
.
allowedLag
- the allowed lag of a given event's timestamp behind the top
timestamp value observed so farStreamStage<T> withTimestamps(@Nonnull ToLongFunctionEx<? super T> timestampFn, long allowedLag)
Issue with sparse events
Event time progresses only through the ingestion of new events. If the events are sparse, time will effectively stop until a newer event arrives. This causes high latency for time-sensitive operations (such as window aggregation). In addition, Jet tracks event time for every source partition separately, and if just one partition has sparse events, time progress in the whole job is hindered.
To overcome this you can either ensure there's a consistent influx of
events in every partition, or you can use withIngestionTimestamps()
.
timestampFn
- a function that returns the timestamp for each item, typically in
milliseconds. It must be stateless and cooperative.allowedLag
- the allowed lag of a given event's timestamp behind the top
timestamp value observed so far. The time unit is
the same as the unit used by timestampFn
Copyright © 2023 Hazelcast, Inc.. All rights reserved.