HadoopSinks (Hazelcast Root 5.2.0 API)

java.lang.Object
- com.hazelcast.jet.hadoop.HadoopSinks

public final class HadoopSinks
extends Object

Factories of Apache Hadoop sinks.

Since:: Jet 3.0

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static <K,V> Sink<Map.Entry<K,V>>`	`outputFormat(org.apache.hadoop.conf.Configuration configuration)` Convenience for `outputFormat(Configuration, FunctionEx, FunctionEx)` which expects `Map.Entry<K, V>` as input and extracts its key and value parts to be written to HDFS.
`static <E,K,V> Sink<E>`	`outputFormat(org.apache.hadoop.conf.Configuration configuration, FunctionEx<? super E,K> extractKeyF, FunctionEx<? super E,V> extractValueF)` Returns a sink that writes to Apache Hadoop HDFS.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - outputFormat
```
@Nonnull
public static <E,K,V> Sink<E> outputFormat(@Nonnull
                                                    org.apache.hadoop.conf.Configuration configuration,
                                                    @Nonnull
                                                    FunctionEx<? super E,K> extractKeyF,
                                                    @Nonnull
                                                    FunctionEx<? super E,V> extractValueF)
```
    Returns a sink that writes to Apache Hadoop HDFS. It transforms each received item to a key-value pair using the two supplied mapping functions. The type of the key and the value must conform to the expectations of the output format specified in the configuration.
    The sink creates a number of files in the output path, identified by the cluster member UUID and the Processor index. Unlike MapReduce, the data in the files is not sorted by key.
    The supplied Configuration must specify an OutputFormat class with a path.
    The processor will use either the new or the old MapReduce API based on the key which stores the OutputFormat configuration. If it's stored under }, the new API will be used. Otherwise, the old API will be used. If you get the configuration from JobContextImpl.getConfiguration(), the new API will be used.
    No state is saved to snapshot for this sink. After the job is restarted, the files will be overwritten. If the cluster members change, some files will be overwritten and some not - we don't clean the directory before the execution starts.
    The default local parallelism for this processor is 2 (or less if less CPUs are available).
    
    Type Parameters:
    
    E - stream item type
    
    K - type of key to write to HDFS
    
    V - type of value to write to HDFS
    
    Parameters:
    
    configuration - Configuration used for output format configuration
    
    extractKeyF - mapper to map a key to another key
    
    extractValueF - mapper to map a value to another value
  - outputFormat
```
@Nonnull
public static <K,V> Sink<Map.Entry<K,V>> outputFormat(@Nonnull
                                                               org.apache.hadoop.conf.Configuration configuration)
```
    Convenience for outputFormat(Configuration, FunctionEx, FunctionEx) which expects Map.Entry<K, V> as input and extracts its key and value parts to be written to HDFS.

Class HadoopSinks

Method Summary

Methods inherited from class java.lang.Object

Method Detail

outputFormat

outputFormat