Interface Partitioner<T>

  • Type Parameters:
    T - type of item the partitioner accepts
    All Superinterfaces:
    java.io.Serializable
    All Known Implementing Classes:
    Partitioner.Default
    Functional Interface:
    This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.

    @FunctionalInterface
    public interface Partitioner<T>
    extends java.io.Serializable
    Encapsulates the logic associated with a DAG edge that decides on the partition ID of an item traveling over it. The partition ID determines which cluster member and which instance of Processor on that member an item will be forwarded to.

    Jet's partitioning piggybacks on Hazelcast partitioning. Standard Hazelcast protocols are used to distribute partition ownership over the members of the cluster. However, if a DAG edge is configured as non-distributed, then on each member there will be some destination processor responsible for any given partition.

    Since:
    Jet 3.0
    • Field Detail

      • HASH_CODE

        static final Partitioner<java.lang.Object> HASH_CODE
        Partitioner which calls Object.hashCode() and coerces it with the modulo operation into the allowed range of partition IDs. The primary reason to prefer this over the default is performance and it's a safe choice on local edges.

        WARNING: this is a dangerous strategy to use on distributed edges. Care must be taken to ensure that the produced hashcode remains stable across serialization-deserialization cycles as well as across all JVM processes. Consider a hashCode() method that is correct with respect to its contract, but not with respect to the stricter contract given above. Take the following scenario:

        1. there are two Jet cluster members;
        2. there is a DAG vertex;
        3. on each member there is a processor for this vertex;
        4. each processor emits an item;
        5. these two items have equal partitioning keys;
        6. nevertheless, on each member they get a different hashcode;
        7. they are routed to different processors, thus failing on the promise that all items with the same partition key go to the same processor.
    • Method Detail

      • init

        default void init​(@Nonnull
                          DefaultPartitionStrategy strat)
        Callback that injects the Hazelcast's default partitioning strategy into this partitioner so it can be consulted by the getPartition(Object, int) method.

        The creation of instances of the Partitioner type is done in user's code, but the Hazelcast partitioning strategy only becomes available after the partitioner is deserialized on each target member. This method solves the lifecycle mismatch.

      • getPartition

        int getPartition​(@Nonnull
                         T item,
                         int partitionCount)
        Returns the partition ID of the given item.
        Parameters:
        partitionCount - the total number of partitions in use by the underlying Hazelcast instance
      • getConstantPartitioningKey

        default T getConstantPartitioningKey()
        Returns:
        constant partition key that is always used by getPartition(Object, int) or null if different partitions may be returned
        Since:
        5.4
      • defaultPartitioner

        static Partitioner<java.lang.Object> defaultPartitioner()
        Returns a partitioner which applies the default Hazelcast partitioning. It will serialize the item using Hazelcast Serialization, then apply Hazelcast's MurmurHash-based algorithm to retrieve the partition ID. This is quite a bit of work, but has stable results across all JVM processes, making it a safe default.