HadoopSources (Hazelcast Root 5.1.4 API)

java.lang.Object
- com.hazelcast.jet.hadoop.HadoopSources

```
public final class HadoopSources
extends Object
```
Contains factory methods for Apache Hadoop sources.

Since:

Jet 3.0

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`COPY_ON_READ` With the new HDFS API, some of the `RecordReader`s return the same key/value instances for each record, for example `LineRecordReader`.
`static String`	`IGNORE_FILE_NOT_FOUND`
`static String`	`SHARED_LOCAL_FS` When reading files from local file system using Hadoop, each processor reads files from its own local file system.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static <K,V> BatchSource<Map.Entry<K,V>>`	`inputFormat(org.apache.hadoop.conf.Configuration jobConf)` Convenience for `inputFormat(Configuration, BiFunctionEx)` with `Map.Entry` as its output type.
`static <K,V,E> BatchSource<E>`	`inputFormat(org.apache.hadoop.conf.Configuration configuration, BiFunctionEx<K,V,E> projectionFn)` Returns a source that reads records from Apache Hadoop HDFS and emits the results of transforming each record (a key-value pair) with the supplied projection function.
`static <K,V,E> BatchSource<E>`	`inputFormat(ConsumerEx<org.apache.hadoop.conf.Configuration> configureFn, BiFunctionEx<K,V,E> projectionFn)` Returns a source that reads records from Apache Hadoop HDFS and emits the results of transforming each record (a key-value pair) with the supplied projection function.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - COPY_ON_READ
```
public static final String COPY_ON_READ
```
    With the new HDFS API, some of the RecordReaders return the same key/value instances for each record, for example LineRecordReader. If this property is set to true, the source makes a copy of each object after applying the projectionFn. For readers which create a new instance for each record, the source can be configured to not copy the objects for performance.
    Also if you are using a projection function which doesn't refer to any mutable state from the key or value, then it makes sense to set this property to false to avoid unnecessary copying.
    The source copies the objects by serializing and de-serializing them. The objects should be either Writable or serializable in a way which Jet can serialize/de-serialize.
    Here is how you can configure the source. Default and always safe value is true:
```
     Configuration conf = new Configuration();
     conf.setBoolean(HadoopSources.COPY_ON_READ, false);
     BatchSource<Entry<K, V>> source = HadoopSources.inputFormat(conf);
 
```
    See Also:
    
    Constant Field Values
  - SHARED_LOCAL_FS
```
public static final String SHARED_LOCAL_FS
```
    When reading files from local file system using Hadoop, each processor reads files from its own local file system. If the local file system is shared between members, e.g NFS mounted filesystem, you should configure this property as true.
    Here is how you can configure the source. Default value is false:
```
     Configuration conf = new Configuration();
     conf.setBoolean(HadoopSources.SHARED_LOCAL_FS, true);
     BatchSource<Entry<K, V>> source = HadoopSources.inputFormat(conf);
 
```
    Since:
    
    Jet 4.4
    
    See Also:
    
    Constant Field Values
  - IGNORE_FILE_NOT_FOUND
```
public static final String IGNORE_FILE_NOT_FOUND
```
    Since:
    
    Jet 4.4
    
    See Also:
    
    Constant Field Values
- Method Detail
  - inputFormat
```
@Nonnull
public static <K,V,E> BatchSource<E> inputFormat(@Nonnull
                                                          org.apache.hadoop.conf.Configuration configuration,
                                                          @Nonnull
                                                          BiFunctionEx<K,V,E> projectionFn)
```
    Returns a source that reads records from Apache Hadoop HDFS and emits the results of transforming each record (a key-value pair) with the supplied projection function.
    This source splits and balances the input data among Jet processors, doing its best to achieve data locality. To this end the Jet cluster topology should be aligned with Hadoop's — on each Hadoop member there should be a Jet member.
    The processor will use either the new or the old MapReduce API based on the key which stores the InputFormat configuration. If it's stored under , the new API will be used. Otherwise, the old API will be used. If you get the configuration from JobContextImpl.getConfiguration(), the new API will be used. Please see COPY_ON_READ if you are using the new API.
    The default local parallelism for this processor is 2 (or less if less CPUs are available).
    This source does not save any state to snapshot. If the job is restarted, all entries will be emitted again.
    
    Type Parameters:
    
    K - key type of the records
    
    V - value type of the records
    
    E - the type of the emitted value
    
    Parameters:
    
    configuration - JobConf for reading files with the appropriate input format and path
    
    projectionFn - function to create output objects from key and value. If the projection returns a null for an item, that item will be filtered out
  - inputFormat
```
@Nonnull
public static <K,V,E> BatchSource<E> inputFormat(@Nonnull
                                                          ConsumerEx<org.apache.hadoop.conf.Configuration> configureFn,
                                                          @Nonnull
                                                          BiFunctionEx<K,V,E> projectionFn)
```
    Returns a source that reads records from Apache Hadoop HDFS and emits the results of transforming each record (a key-value pair) with the supplied projection function.
    This source splits and balances the input data among Jet processors, doing its best to achieve data locality. To this end the Jet cluster topology should be aligned with Hadoop's — on each Hadoop member there should be a Jet member.
    The configureFn is used to configure the MR Job. The function is run on the coordinator node of the Jet Job, avoiding contacting the server from the machine where the job is submitted.
    The new MapReduce API will be used.
    The default local parallelism for this processor is 2 (or less if less CPUs are available).
    This source does not save any state to snapshot. If the job is restarted, all entries will be emitted again.
    
    Type Parameters:
    
    K - key type of the records
    
    V - value type of the records
    
    E - the type of the emitted value
    
    Parameters:
    
    configureFn - function to configure the MR job
    
    projectionFn - function to create output objects from key and value. If the projection returns a null for an item, that item will be filtered out
  - inputFormat
```
@Nonnull
public static <K,V> BatchSource<Map.Entry<K,V>> inputFormat(@Nonnull
                                                                     org.apache.hadoop.conf.Configuration jobConf)
```
    Convenience for inputFormat(Configuration, BiFunctionEx) with Map.Entry as its output type.

Class HadoopSources

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

COPY_ON_READ

SHARED_LOCAL_FS

IGNORE_FILE_NOT_FOUND

Method Detail

inputFormat

inputFormat

inputFormat