Package com.hazelcast.mapreduce

This package contains the MapReduce API definition for Hazelcast.
All map reduce operations running in a distributed manner inside the active Hazelcast cluster.


Interface Summary
Collator<ValueIn,ValueOut> This interface can be implemented to define a Collator which is executed after calculation of the MapReduce algorithm on remote cluster nodes but before returning the final result.
Collator can for example be used to sum up a final value.
CombinerFactory<KeyIn,ValueIn,ValueOut> A CombinerFactory implementation is used to build Combiner instances per key.
An implementation needs to be serializable by Hazelcast since it is distributed together with the Mapper implementation to run alongside.
Context<K,V> The Context interface is used for emitting keys and values to the intermediate working space of the MapReduce algorithm.
Job<KeyIn,ValueIn> This interface describes a mapreduce Job that is build by JobTracker.newJob(KeyValueSource).
It is used to execute mappings and calculations on the different cluster nodes and reduce or collate these mapped values to results.
JobCompletableFuture<V> This is a special version of ICompletableFuture to return the assigned job id of the submit operation.
JobPartitionState An implementation of this interface contains current information about the status of an process piece while operation is executing.
JobProcessInformation This interface holds basic information about a running map reduce job like state of the different partitions and the number of currently processed records.
The number of processed records is not a real time value but updated on regular base (after 1000 processed elements per node).
JobTracker The JobTracker interface is used to create instances of Jobs depending on the given data structure / data source.
KeyPredicate<Key> This interface is used to pre evaluate keys before spreading the MapReduce task to the cluster.
LifecycleMapper<KeyIn,ValueIn,KeyOut,ValueOut> The LifecycleMapper interface is a more sophisticated version of Mapper normally used for complexer algorithms with a need of initialization and finalization.
Mapper<KeyIn,ValueIn,KeyOut,ValueOut> The interface Mapper is used to build mappers for the Job.
MappingJob<EntryKey,KeyIn,ValueIn> This interface describes a mapping mapreduce Job.
For further information Job.
PartitionIdAware This interface can be used to mark implementation being aware of the data partition it is currently working on.
ReducerFactory<KeyIn,ValueIn,ValueOut> A ReducerFactory implementation is used to build Reducer instances per key.
An implementation needs to be serializable by Hazelcast since it might be distributed inside the cluster to do parallel calculations of reducing step.
ReducingJob<EntryKey,KeyIn,ValueIn> This interface describes a reducing mapreduce Job.
For further information Job.
ReducingSubmittableJob<EntryKey,KeyIn,ValueIn> This interface describes a submittable mapreduce Job.
For further information Job.
TrackableJob<V> This interface describes a trackable job.

Class Summary
Combiner<ValueIn,ValueOut> The abstract Combiner class is used to build combiners for the Job.
Those Combiners are distributed inside of the cluster and are running alongside the Mapper implementations in the same node.
Combiners are called in a threadsafe way so internal locking is not required.
KeyValueSource<K,V> The abstract KeyValueSource class is used to implement custom data sources for mapreduce algorithms.
Default shipped implementations contains KeyValueSources for Hazelcast data structures like IMap and MultiMap.
LifecycleMapperAdapter<KeyIn,ValueIn,KeyOut,ValueOut> The abstract LifecycleMapperAdapter superclass is used to ease building mappers for the Job.
Reducer<ValueIn,ValueOut> The abstract Reducer class is used to build reducers for the Job.
Reducers may be distributed inside of the cluster but there is always only one Reducer per key.

Enum Summary
JobPartitionState.State Definition of the processing states
TopologyChangedStrategy This enum class is used to define how a map reduce job behaves if the job owner recognizes a topology changed event.
When members are leaving the cluster it might happen to loose processed data chunks that were already send to the reducers on the leaving node.
In addition to that on any topology change there is a redistribution of the member assigned partitions which means that a map job might have a problem to finish it's currently processed partition.
The default behavior is immediately cancelling the running task and throwing an TopologyChangedException but it is possible to submit the same job configuration again if JobTracker.getTrackableJob(String) returns null for the requested job id.

Exception Summary
RemoteMapReduceException This exception class is used to show stacktraces of multiple failed remote operations at once.
TopologyChangedException This exception is thrown when a topology change happens during the execution of a map reduce job and the TopologyChangedStrategy is set to TopologyChangedStrategy.CANCEL_RUNNING_OPERATION.

Package com.hazelcast.mapreduce Description

This package contains the MapReduce API definition for Hazelcast.
All map reduce operations running in a distributed manner inside the active Hazelcast cluster. Therefor Mapper, Combiner and Reducer implementations need to be fully serializable by Hazelcast. Any of the existing serialization patterns are available for those classes, too.
If custom KeyValueSource is provided above statement also applies to this implementation.

For a basic idea how to use this framework see Job or Mapper, Combiner or Reducer.


Copyright © 2015 Hazelcast, Inc.. All Rights Reserved.