public final class HdfsSources extends Object
Modifier and Type | Method and Description |
---|---|
static <K,V> BatchSource<Map.Entry<K,V>> |
hdfs(org.apache.hadoop.mapred.JobConf jobConf)
Convenience for
hdfs(JobConf, DistributedBiFunction)
with Map.Entry as its output type. |
static <K,V,E> BatchSource<E> |
hdfs(org.apache.hadoop.mapred.JobConf jobConf,
DistributedBiFunction<K,V,E> projectionFn)
Returns a source that reads records from Apache Hadoop HDFS and emits
the results of transforming each record (a key-value pair) with the
supplied mapping function.
|
@Nonnull public static <K,V,E> BatchSource<E> hdfs(@Nonnull org.apache.hadoop.mapred.JobConf jobConf, @Nonnull DistributedBiFunction<K,V,E> projectionFn)
This source splits and balances the input data among Jet processors, doing its best to achieve data locality. To this end the Jet cluster topology should be aligned with Hadoop's — on each Hadoop member there should be a Jet member.
Default local parallelism for this processor is 2 (or less if less CPUs are available).
This source does not save any state to snapshot. If the job is restarted, all entries will be emitted again.
K
- key type of the recordsV
- value type of the recordsE
- the type of the emitted valuejobConf
- JobConf for reading files with the appropriate input format and pathprojectionFn
- function to create output objects from key and value.
If the projection returns a null
for an item, that item
will be filtered out@Nonnull public static <K,V> BatchSource<Map.Entry<K,V>> hdfs(@Nonnull org.apache.hadoop.mapred.JobConf jobConf)
hdfs(JobConf, DistributedBiFunction)
with Map.Entry
as its output type.Copyright © 2018 Hazelcast, Inc.. All rights reserved.