Implementing Aggregations - Hazelcast Reference Manual

This section explains how to implement your own aggregations in your own application. It is an advanced section, so if you do not intend to implement your own aggregation, you might want to stop reading here and come back later when you need to know how to implement your own aggregation.

An Aggregation implementation is defining a MapReduce task, but with a small difference: the Mapper is always expected to work on a Supplier that filters and/or transforms the mapped input value to some output value.

Aggregation Methods

The main interface for making your own aggregation is com.hazelcast.mapreduce.aggregation.Aggregation. It consists of four methods.

interface Aggregation<Key, Supplied, Result> {
  Mapper getMapper(Supplier<Key, ?, Supplied> supplier);
  CombinerFactory getCombinerFactory();
  ReducerFactory getReducerFactory();
  Collator<Map.Entry, Result> getCollator();
}

The getMapper and getReducerFactory methods should return non-null values. getCombinerFactory and getCollator are optional operations and you do not need to implement them. You can decide to implement them depending on the use case you want to achieve.

Aggregation Tips

For more information on how you implement mappers, combiners, reducers, and collators, refer to the MapReduce section.

For best speed and traffic usage, as mentioned in the MapReduce section, you should add a Combiner to your aggregation whenever it is possible to do some kind of pre-reduction step.

Your implementation also should use DataSerializable or IdentifiedDataSerializable for best compatibility and speed/stream-size reasons.

Distributed Query Aggregators Implementing Aggregations

Aggregation Methods

Aggregation Tips