Partitioning Strategy - Hazelcast Jet Reference Manual

The concern of coming up with a partition ID for an item has two aspects:

extract the partitioning key from the item;
calculate the partition ID from the key.

The first point is covered by the key extractor function and the second by the Partitioner type. In most cases the choice of partitioner boils down to two types provided out of the box:

Default Hazelcast partitioner: safe but slower;
Object.hashCode()-based partitioner: typically faster, but not safe in general.

The trouble with Object.hashCode() is that its contract is only concerned with instances that live within the same JVM. It says nothing about the correspondence of hash codes on two separate JVM processes, but for distributed edges it is essential that the hashcode of the deserialized object stays the same as the original. Some clasess, like String or Integer, specify exactly how they calculate the hashcode; these types are safe to be partitioned by hashcode. When these guarantees don't exist, the default partitioner can be used. It will serialize the object and use Hazelcast's standard MurmurHash3 algorithm to get the partition ID.

Both aspects of partitioning are specified as arguments to the edge's partitioned() method. This example specifies default Hazelcast partitioner:

edge.partitioned(wholeItem());

and this one specifies the Object.hashCode() strategy:

edge.partitioned(wholeItem(), HASH_CODE));

Tutorial - Building an Inverted TF-IDF Index with Core API Translating to Jet DAG Partitioning Strategy