Class HadoopSinks
- Since:
- Jet 3.0
-
Method Summary
Modifier and TypeMethodDescriptionoutputFormat
(org.apache.hadoop.conf.Configuration configuration) Convenience foroutputFormat(Configuration, FunctionEx, FunctionEx)
which expectsMap.Entry<K, V>
as input and extracts its key and value parts to be written to HDFS.static <E,
K, V> Sink<E> outputFormat
(org.apache.hadoop.conf.Configuration configuration, FunctionEx<? super E, K> extractKeyF, FunctionEx<? super E, V> extractValueF) Returns a sink that writes to Apache Hadoop HDFS.
-
Method Details
-
outputFormat
@Nonnull public static <E,K, Sink<E> outputFormatV> (@Nonnull org.apache.hadoop.conf.Configuration configuration, @Nonnull FunctionEx<? super E, K> extractKeyF, @Nonnull FunctionEx<? super E, V> extractValueF) Returns a sink that writes to Apache Hadoop HDFS. It transforms each received item to a key-value pair using the two supplied mapping functions. The type of the key and the value must conform to the expectations of the output format specified in theconfiguration
.The sink creates a number of files in the output path, identified by the cluster member UUID and the
Processor
index. Unlike MapReduce, the data in the files is not sorted by key.The supplied
Configuration
must specify anOutputFormat
class with a path.The processor will use either the new or the old MapReduce API based on the key which stores the
OutputFormat
configuration. If it's stored under "mapreduce.job.outputformat.class"}, the new API will be used. Otherwise, the old API will be used. If you get the configuration fromJobContextImpl.getConfiguration()
, the new API will be used.No state is saved to snapshot for this sink. After the job is restarted, the files will be overwritten. If the cluster members change, some files will be overwritten and some not - we don't clean the directory before the execution starts.
The default local parallelism for this processor is 2 (or less if less CPUs are available).
- Type Parameters:
E
- stream item typeK
- type of key to write to HDFSV
- type of value to write to HDFS- Parameters:
configuration
-Configuration
used for output format configurationextractKeyF
- mapper to map a key to another keyextractValueF
- mapper to map a value to another value
-
outputFormat
@Nonnull public static <K,V> Sink<Map.Entry<K,V>> outputFormat(@Nonnull org.apache.hadoop.conf.Configuration configuration) Convenience foroutputFormat(Configuration, FunctionEx, FunctionEx)
which expectsMap.Entry<K, V>
as input and extracts its key and value parts to be written to HDFS.
-