T
- the type of items a source using this file format will emitpublic class FileSourceBuilder<T> extends Object
To create an instance, use FileSources.files(String)
.
Modifier and Type | Method and Description |
---|---|
BatchSource<T> |
build()
Builds a
BatchSource based on the current state of the builder. |
ProcessorMetaSupplier |
buildMetaSupplier()
Builds a
ProcessorMetaSupplier based on the current state of the
builder. |
<T_NEW> FileSourceBuilder<T_NEW> |
format(FileFormat<T_NEW> fileFormat)
Set the file format for the source.
|
FileSourceBuilder<T> |
glob(String glob)
Sets a glob pattern to filter the files in the specified directory.
|
static boolean |
hasHadoopPrefix(String path)
Checks if the given path starts with one of the defined Hadoop
prefixes:
"s3a://", // Amazon S3
"hdfs://", // HDFS
"wasbs://", // Azure Cloud Storage
"adl://", // Azure Data Lake Gen 1
"abfs://", // Azure Data Lake Gen 2
"gs://" // Google Cloud Storage
see
HADOOP_PREFIXES |
FileSourceBuilder<T> |
ignoreFileNotFound(boolean ignoreFileNotFound)
Set to true to ignore no matching files in the directory specified by
path . |
FileSourceBuilder<T> |
option(String key,
String value)
Specifies an arbitrary option for the underlying source.
|
FileSourceBuilder<T> |
sharedFileSystem(boolean sharedFileSystem)
If
sharedFileSystem is true , Jet will assume all members
see the same files. |
FileSourceBuilder<T> |
useHadoopForLocalFiles(boolean useHadoop)
Specifies that Jet should use Apache Hadoop for files from the local
filesystem.
|
public FileSourceBuilder<T> glob(@Nonnull String glob)
glob
- glob pattern,@Nonnull public <T_NEW> FileSourceBuilder<T_NEW> format(@Nonnull FileFormat<T_NEW> fileFormat)
FileFormat
for available
formats and factory methods.
It's not possible to implement a custom format.
@Nonnull public FileSourceBuilder<T> useHadoopForLocalFiles(boolean useHadoop)
Default value is false
.
useHadoop
- if Hadoop should be use for reading local filesystem@Nonnull public FileSourceBuilder<T> sharedFileSystem(boolean sharedFileSystem)
sharedFileSystem
is true
, Jet will assume all members
see the same files. They will split the work so that each member will
read a part of the files. If sharedFileSystem
is false
,
each member will read all files in the directory, assuming that other
members see different files.
This option applies only for the local filesystem when Hadoop is not used and when the directory doesn't contain a prefix for a remote file system. Distributed filesystems are always assumed to be shared.
If you start all the members on a single machine (such as for
development), set this property to true
. If you have multiple
machines with multiple members each and the directory is not a shared
storage, it's not possible to configure the file reader correctly - use
only one member per machine.
Default value is false
.
@Nonnull public FileSourceBuilder<T> ignoreFileNotFound(boolean ignoreFileNotFound)
path
.
When there is no file matching the glob specified by
glob(String)
(or the default glob) Jet throws an exception by
default. This might be problematic in some cases, where the directory
is empty. To override this behaviour set this to true.
If set to true and there are no files in the directory the source will produce 0 items.
Default value is false
.
ignoreFileNotFound
- true if no files in the specified directory should be accepted@Nonnull public FileSourceBuilder<T> option(String key, String value)
FileFormat
class
you're using, it offers parsing-related options.@Nonnull public BatchSource<T> build()
BatchSource
based on the current state of the builder.@Nonnull public ProcessorMetaSupplier buildMetaSupplier()
ProcessorMetaSupplier
based on the current state of the
builder. Use for integration with the Core API.
This method is a part of Core API and has lower backward-compatibility guarantees (we can change it in minor version).
public static boolean hasHadoopPrefix(String path)
HADOOP_PREFIXES
Copyright © 2023 Hazelcast, Inc.. All rights reserved.