public final class HadoopSinks extends Object
|Modifier and Type||Method and Description|
Returns a sink that writes to Apache Hadoop HDFS.
@Nonnull public static <E,K,V> Sink<E> outputFormat(@Nonnull org.apache.hadoop.conf.Configuration configuration, @Nonnull FunctionEx<? super E,K> extractKeyF, @Nonnull FunctionEx<? super E,V> extractValueF)
The sink creates a number of files in the output path, identified by the
cluster member UUID and the
Processor index. Unlike MapReduce,
the data in the files is not sorted by key.
Configuration must specify an
class with a path.
The processor will use either the new or the old MapReduce API based on
the key which stores the
OutputFormat configuration. If it's
stored under }, the new API
will be used. Otherwise, the old API will be used. If you get the
JobContextImpl.getConfiguration(), the new API will be
No state is saved to snapshot for this sink. After the job is restarted, the files will be overwritten. If the cluster members change, some files will be overwritten and some not - we don't clean the directory before the execution starts.
The default local parallelism for this processor is 2 (or less if less CPUs are available).
E- stream item type
K- type of key to write to HDFS
V- type of value to write to HDFS
Configurationused for output format configuration
extractKeyF- mapper to map a key to another key
extractValueF- mapper to map a value to another value
@Nonnull public static <K,V> Sink<Map.Entry<K,V>> outputFormat(@Nonnull org.apache.hadoop.conf.Configuration configuration)
outputFormat(Configuration, FunctionEx, FunctionEx)which expects
Map.Entry<K, V>as input and extracts its key and value parts to be written to HDFS.
Copyright © 2021 Hazelcast, Inc.. All rights reserved.