logo dark

Hazelcast Jet Release Notes

Welcome to the Hazelcast Jet Release Notes. This document includes the new features, enhancements and fixed issues for Hazelcast Jet releases. Select a release number in the table of contents.

The linked numbers refer to the issue numbers in Hazelcast Jet GitHub repository. The labels in the square brackets refer to the respective Hazelcast Jet module.

3.0

New Features

Enhancements

  • [pipeline-api] User code deployment is not required anymore, when reading from remote Hazelcast IMDG journal. #1306

  • [pipeline-api] Improved the performance of global aggregations by using constant and randomly generated key for each aggregation. #1242

  • [pipeline-api] Optimized mapWithMerging so that it uses less memory. #1107

  • [pipeline-api] Improved the watermark concept to eliminate generating excessive amount of watermarks when you have a timestamped pipeline which does not do any windowing. #1091

  • [core] Improved the exception when JetBootstrap is used without Jet CLI’s submit command. #1321

  • [core] Minimum idle time in the execution service has been made configurable. #1317

  • [core] Added partition migration detection so that the jobs are automatically restarted if a partition is being migrated when reading from IMap. #1243

  • [core] The Jet client is now automatically shut down when using JetBootstrap. #1225

  • [core] Added support for JDK9-11 builds. #1223

  • [core] Added new metrics to monitor watermarks and latencies. #1173

  • [core] Added the ability to reusing the same maps when taking snapshots. #1104

Fixes

  • [core] Fixed the potential race during the initialization of multiple JetInstances. #1322

  • [core] Simplified and fixed the garbage collection for job resources. #1320

  • [core] Fixed the race condition in the cooperative worker shutdown. #1296

  • [core] Fixed an issue where the mutation of JobConfig objects was affecting the job after the job is submitted. #1233

  • [core] Fixed an issue where the non-smart clients cannot submit a job if they connect to a non-master member. #1129

  • [pipeline-api] Fixed the local parallelism value for global session windows; it is now 1. #1300

  • [hadoop] Fixed the creation of RecordWriter to receive its own copy. #1309

Breaking Changes

  • [pipeline-api] Removed TimestampedItem and TimestampedEntry. #1307

  • [pipeline-api] Renamed functional interfaces such as DistributedFunction as FunctionEx and removed the unused functional interfaces. #1293

  • [pipeline-api] Simplified mapUsingIMap and mapUsingReplicated map operations to make use of async calls. #1292

  • [pipeline-api] Renamed various methods in ContextFactory. #1285

  • [pipeline-api] Window operators now emit KeyedWindowResult or WindowResult instead of TimestampedItem or TimestampedEntry. #1244

  • [pipeline-api] Removed DistributedFunction.constantKey(). #1242

  • [pipeline-api] When reading from a source, you must now specify how the timestamps should be handled. #1166

  • [core-api] Added a constant key parameter to Edge.allToOne(). #1242

  • [core-api] Removed the watermark policies except limitingLag. #1184

  • [core-api] Removed maxRetain option from EdgeConfig. #1184

  • [core] Added an exception to be thrown if a job with the same name already exists when submitting jobs. #1203

  • [core] Introduced the jet.sh submit command which supersedes the jet-submit.sh script. #1202

  • [core] Renamed MetricsConfig.setMetricsForDataStructures to setMetricsForDataStructuresEnabled. #1201

Removed/Deprecated Features

  • [core] Group password has been deprecated and removed. #1302

0.7.2

New Features

Enhancements

  • [pipeline-api] Added TopN and BottomN aggregate operations. #1140

Fixes

  • [core] Fixed the missing config schema file in the previous release (0.7.1).

  • [pipeline-api] All of ClientConfig must be serialized when connecting to remote IMDG sources and sinks. #1147

  • [core] scaleUpDelay now can be configured declaratively or within the Spring context. #1148

  • [core] Watermarks are now not added to a stream if there are no windowing stages. #1144

  • [core] Distinct keys in SessionWindowP are now not materialized. #1121

  • [core] Fixed an issue where there was a significant lambda memory allocation pressure inside ProcessorTasklet and StoreSnapshotTaskletExtract. #1106

  • [core] Fixed an issue where there was a race condition during the snapshot creation, which could result in the snapshot to be not completed. #1101

Due to an issue related to XML schema files and the way they are validated, Hazelcast Jet 0.7.1 was released but broken. Hazelcast Jet 0.7.2 has been released on top of 0.7.1 with the related fix.

0.7

New Features

  • [pipeline-api] Added a source builder to build custom sources. #989

  • [pipeline-api] Added JDBC connector to enable Jet reading from/writing to JDBC source. #931

  • [pipeline-api] Added Avro connector to work with Avro files. #890, #925

  • [pipeline-api] Added convenient API to enrich a stream from IMap and Replicated Map. #924

  • [pipeline-api] Added map/flatMap/filterUsingContext for keyed streams. #892

  • [pipeline-api] Introduced new merge and distinct operations and a simpler co-aggregation. #885

  • [pipeline-api] Added JMS connector to enable Jet reading from/writing to JMS provider. #874

  • [core] Added full job elasticity with graceful shutdown, and suspend and resume. #946

  • [core] Added support for exposing metrics through JMX. #926

Enhancements

  • [pipeline-api] Added Tuple4 and Tuple5. #964

  • [pipeline-api] Added the method AggregateOperation.andThen. #959

  • [pipeline-api] Added exception throwing methods to distributed lambdas. #951

  • [pipeline-api] The method mapToOutputFn in aggregations now supports returning nulls. #938

  • [pipeline-api] File sources and sinks have been converted to use builders for more convenience. #913

  • [pipeline-api] Added a new String parameter name to Sinks.builder. #894

  • [pipeline-api] Added support for DOT file visualization for DAG and Pipeline API. #891

  • [core] Added JetClientConfig which uses the default Jet group name and password. #950

  • [core] Logs now provide job names. #939

  • [core] Improved Processors so that they are now closed as soon as the processor is completed. Also, jobId(), executionId() and jobConfig() are now available on Processor.Context. #934

  • [core] Improved Watermark diagnostics logging. #872

Fixes

  • [pipeline-api] Fixed the method mapWithMerging which should have not used a stateful flush function. #968

  • [pipeline-api] Fixed the issue of causing a window shift when using the results of a window aggregation in another window. #898

  • [kafka] StreamKafkaP.close() now handles InterruptException. #870

  • [java.util.stream] Removed java.util.stream implementation. It has been superseded by the Pipeline API. #983

  • [spring] Spring context is now injected to the wrapped processors as well to enable the @SpringAware annotation to be processed for sinks. #999

  • [spring] An IllegalArgumentException was thrown because the hazelcast-jet-spring module was adding a dependency to Hazelcast when used with Spring boot. This has been fixed so that hazelcast-jet-spring works out of the box. #977

Breaking Changes

  • [pipeline-api] Simplified the API for co-aggregation.

  • [pipeline-api] Replaced file sources and sinks with a builder API through Sources.filesBuilder().

  • [pipeline-api] SinkBuilder now requires a name and takes Processor.Context as input in createFn.

  • [core] Removed AbstractProcessor.setCooperative.

  • [core] Removed tempDir from config as it was unused.

  • [core] Processor.close() no longer receives the exception for the job.

0.6.1

New Features

  • [core] Add support for configuring a custom classloader per job. #770

Enhancements

  • [pipeline-api] Add a noop sink. #827

  • [core] Utility methods to load JetConfig from XML file or resource directly. #856

  • [core] Update Hazelcast IMDG references to "3.10". #833

  • [core] Optimize in-memory layout of snapshot data: reduced memory usage by up to 75%. #822

  • [core] Idle CPU usage improvements when there’s a large number of non-cooperative processors. #816

Fixes

  • [pipeline-api] SinkBuilder shouldn’t call destroyFn if createFn wasn’t called #826

  • [pipeline-api] Fix generic output type for aggregate operations with a custom mapToOutputFn. #823

  • [core] Don’t shrink receive window during lulls which can lead to reduced throughput. #798

  • [kafka] Kafka source should handle InterruptException in all cases. #824

  • [hadoop] When matching HDFS hostnames against member hostnames, use alternative ways than just reverse DNS. #858

0.6

New Features

  • [pipeline-api] Windowing support and overhaul of Pipeline API to work with streaming sources. #731

  • [pipeline-api] New updating and merging map sinks which update the value in place. #621

  • [pipeline-api] Custom sink builder support for Pipeline API. #762

  • [core] Improved job lifecycle features such as naming jobs and getting jobs by name. #666

  • [core] Ability to restart jobs manually to make use of newly joined members. #693

  • [spring] Add support for annotation and XML based spring configuration. #757

Enhancements

  • [pipeline-api] Improve the method AggregateOperations.allOf() so that is type-safe.

  • [pipeline-api] Predictable vertex names when converting Pipelines to DAG. #616

  • [pipeline-api] Add support for stateful transform operations in the pipeline API. #744

  • [pipeline-api] Add ability to set parallelism and name of each stage separately. #731

  • [pipeline-api] Add generic Predicate factory methods. #695

  • [core] Support for idle streaming sources through a configurable idle timeout where idle processors are excluded from Watermark coalescing. #660

  • [core] Add support for initializing processors with Hazelcast Managed Context. #645

  • [core] Improve watermark coalescing and processing. #649

  • [core] Make parallelism information available to Processor.Context. #754

  • [core] Include Hazelcast Javadocs in Jet distribution. #652

  • [core] Add Java 9 build support and automatic module names. #720

  • [core] Add ability to emit file name from file sources. #729

  • [core] Several Javadoc improvements.

  • [java.util.stream] New DistributedStream instances now can work with Pipeline API sources.

  • [kafka] Kafka version updated to 1.0.0.

  • [hadoop] Hadoop version updated to 2.8.3.

Fixes

  • [core] Avoid race between job cancellation and completion. #604

  • [core] Do not send a join request for a job unless Job.join() is explicitly called. #628

  • [core] When master node leaves cluster, job would restart even if auto-restart was disabled. #683

  • [core] When using map source, returning null in a projection would cause rest of the items to be dropped. #669

  • [core] When DAG fails to deserialize on job submission from client, the exception was not returned to user. #794

  • [core] Fix for ClassNotFoundException when restoring from snapshot when the key class is not on Jet classpath. #778

  • [core] Assert that always same item offered to Outbox after a rejection. #808

  • [kafka] Kafka sink should ensure that all records are sent before snapshot. #611

  • [kafka/hadoop] Package names reorganized to have better support for module systems.

Technical Changes

  • [pipeline-api] Grouping and aggregation in Pipeline API now are done through the methods .groupingKey() and .aggregate().

  • [pipeline-api] Package names for Pipeline API have been reorganized.

  • [core] Job cancellation can now only be done through Job.cancel() rather than Job.getFuture().cancel().

  • [core] IStreamMap, IStreamList and IStreamCache have been renamed to IMapJet, IListJet and ICacheJet, respectively.

  • [core] Dedicated tryProcessWatermark() method is now responsible for processing Watermarks in a processor and watermarks will be forwarded automatically once it returns true.

  • [core] start-jet.sh, submit-jet.sh and other scripts have been renamed to have prefix "jet" instead.

  • [core] CloseableProcessorSupplier has been replaced with a close() method on the Processor instead.

  • [core] complete methods on ProcessorMetaSupplier and ProcessorSupplier have been renamed to close().

  • [java.util.stream] DistributedStream creation have been moved to DistributedStream class from Map and List instances. We strongly suggest to switch to Pipeline API instead of java.util.stream API.

0.5.1

Fixes

  • [pipeline-api] Generated vertex names are now deterministic instead of random. #623

  • [pipeline-api] Vertex names for sources and sinks should now be more explicit. #623

  • [core] JetInstance.getJobs() now only sends a join job request when getFuture() is called. #633

  • [core] When configuring internal Jet maps, do not inherit default map configuration from Hazelcast. #629

  • [core] Update schema version in default Hazelcast configuration XML file. #627

  • [core] When peeking input and output of a processor, the logger name of the peeked processor should be used. #617

  • [core] Fix race condition when calling ProcessorSupplier.complete() during job cancellation. #604

  • [kafka] Kafka writer should flush when taking a snapshot. #624

  • [javadoc] Replace Javadoc for com.hazelcast.jet.function and com.hazelcast.jet.stream with links to JDK docs. #636

  • [javadoc] AggregateOperations Javadoc updates. #635

0.5

New Features

  • [core] Support for in-memory snapshots and at-least and exactly-once processing. #500

  • [core] New Pipeline API with support for group by, hash join and co group. #497

  • [core] Improved Jet job lifecycle management including auto-restart. #492

  • [core] Event journal streaming source for Map and Cache. #487

Enhancements

  • [core] Add ProcessorMetaSupplier.complete() method to support global cleanup. #571

  • [core] Socket source now uses non-blocking IO. #554

  • [core] Add preferred local parallelism to all the sources and sinks. #552

  • [core] InsertWatermarksProcessor will now log late events. #551

  • [core] Add projection and predicate capability to map reader. #490

  • [core] Hazelcast version is updated to 3.9. #474

  • [java.util.stream] Support for creating a stream from any source. #360

  • [kafka] Add ability to create own ProducerRecord instances for Kafka writer. #588

  • [kafka] When a job, which includes Kafka source, is cancelled, StreamKafkaP throws InterruptedException. This exception should be prevented or it should give a descriptive log. #570

  • [kafka] Add optional projection function to Kafka reader. #534

  • [kafka] Add snapshotting support for Kafka reader. #500

Code Samples Improvements
  • Overall module reorganization roughly divided to code samples using Pipeline API and Core API.

  • Several examples migrated to use the new Pipeline API.

  • New examples using hash join enrichment and co group.

  • New code samples illustrating the new event journal and fault tolerance features.

  • New code samples showing how to use HDFS with java.util.stream.

Fixes

  • [core] The method Job.getJobStatus() does not reflect actual status during cancellation. #580

  • [core] Add missing serializer for Session type. #562

  • [core] The method Traversers.traverseStream() should close the stream when exhausted. #560

  • [core] Kafka source should throw when partition count is less than global parallelism. #559

  • [core] Suppress exception for flow control packet when the member has already left the cluster. #556

  • [core] Fix the ignored charset in StreamSocketP. #553

  • [core] JetClassLoader should not throw an exception on the method findResource(). #549

  • [core] Add cache manager in order not to require a dependency on javax.cache. #485

  • [core] Fix NPE when releasing uninitialized ResourceStore. #478

  • [core] Replace ReadFileStreamP with batch file processor. #373

  • [core] ReadFileStreamP should respect Outbox limit. #337

  • [kafka] When init phase for the job fails, StreamKafkaP throws NPE. There should be a null check. #568

  • [hdfs] Making sure a proper global cleanup is performed on job cancellation. #572

0.4

New Features

  • [core] Infinite stream processing with event-time semantics using watermarks.

  • [core] Out of the box processors for tumbling, sliding and session window aggregations.

  • [core] New AggregateOperation abstraction with several out of the box aggregators.

  • [core] Bootstrapper script for submitting a JAR and running a DAG from it. #421

  • [core] Sources and Sinks reorganized into Sources and Sinks static classes.

  • [core] Added Sources.streamSocket() and Sinks.writeSocket() for reading from and writing to sockets. #358, #363

  • [core] Added Sources.readFile(), Sources.streamFile() and Sources.writeFile() for reading and writing files. #373, #376

  • [core] Added Sources.readCache() and Sinks.writeCache() for reading from and writing to Hazelcast ICache. #357

  • [core] Added diagnostic processors for debugging. #426

  • [java.util.stream] java.util.stream support for ICache. #360

  • [pcf] Hazelcast Jet tile for Pivotal Cloud Foundry was released.

Enhancements

  • [core] Hazelcast IMDG updated to 3.8.2 and is now shaded in the Jet JAR. #367

  • [core] Blocking outbox support for non-cooperative processors. #372, #375

  • [core] JobFuture added to Processor.Context. #381

  • [core] Processor.Context.index() renamed to globalProcessorIndex() and now returns a global index. #405

  • [core] Fairer distribution of tasklets when number of cores is larger than number of tasklets in a job. #443

  • [java.util.stream] Added configuration option to DistributedStream. #359

Code Samples Improvements
  • Sliding window Stock exchange streaming sample. #19, #21

  • Hazelcast IMap/IList/ICache sample. #16

  • PCF-integration with Spring Boot sample. #17

  • Socket reader and writer examples with Netty. #15

  • Read and write files example. #24

  • Top N streaming example. #27

  • HashJoin and CoGroup samples added. #28, #29

Fixes

  • [core] Missing return value for JobConfig.addJar(). #393

  • [core] Improve reliability of ExecutionService on shutdown. #409

  • [core] Metaspace memory was not recovered fully when custom class loader is used. #466

0.3.1

New Features

Enhancements

  • [core] Improved DAG.toString. #310

  • [core] Convenience to make any processor non-cooperative. #321

  • [core] Updated Hazelcast version to 3.8. #323

  • [core] Added missing functional interfaces to the Distributed class. #324

  • [java.util.stream] Refactoring of Jet collectors into the new type Reducer. #313

  • [java.util.stream] Allow branching of the j.u.s pipeline. #318

  • [hadoop] Added support for reading and writing non-text data from or to HDFS. See the HDFS section for details.

  • [hadoop] Added key/value mappers for ReadHdfsP and WriteHdfsP. #328

  • [kafka] Kafka connector now makes use of consumer groups. See the Kafka section.

Code Samples Improvements
  • [code-samples] Added Jet vs. java.util.stream benchmark to the code samples.

  • [code-samples] Added new Hadoop word count example.

  • [code-samples] Added Kafka consumer example.

Fixes

  • [core] Remove dead and potentially racy code from async operations. #320

  • [core] ReadSocketTextStreamP should emit items immediately after reading. #335

  • [core] Wrong configuration file is used for IMDG when Jet config file present in classpath. #345

  • [java.util.stream] Do not require forEach action to be serializable. #340, #341