Package com.hazelcast.jet.pipeline.file
Class FileSourceBuilder<T>
java.lang.Object
com.hazelcast.jet.pipeline.file.FileSourceBuilder<T>
- Type Parameters:
T
- the type of items a source using this file format will emit
A unified builder object for various kinds of file sources.
To create an instance, use FileSources.files(String)
.
- Since:
- Jet 4.4
-
Method Summary
Modifier and TypeMethodDescriptionbuild()
Builds aBatchSource
based on the current state of the builder.Builds aProcessorMetaSupplier
based on the current state of the builder.<T_NEW> FileSourceBuilder<T_NEW>
format
(FileFormat<T_NEW> fileFormat) Set the file format for the source.Sets a glob pattern to filter the files in the specified directory.static boolean
hasHadoopPrefix
(String path) Checks if the given path starts with one of the defined Hadoop prefixes: "s3a://", // Amazon S3 "hdfs://", // HDFS "wasbs://", // Azure Cloud Storage "adl://", // Azure Data Lake Gen 1 "abfs://", // Azure Data Lake Gen 2 "gs://" // Google Cloud Storage seeHADOOP_PREFIXES
ignoreFileNotFound
(boolean ignoreFileNotFound) Set to true to ignore no matching files in the directory specified bypath
.Specifies an arbitrary option for the underlying source.sharedFileSystem
(boolean sharedFileSystem) IfsharedFileSystem
istrue
, Jet will assume all members see the same files.useHadoopForLocalFiles
(boolean useHadoop) Specifies that Jet should use Apache Hadoop for files from the local filesystem.
-
Method Details
-
glob
Sets a glob pattern to filter the files in the specified directory. The default value is '*', matching all files in the directory.- Parameters:
glob
- glob pattern,
-
format
Set the file format for the source. SeeFileFormat
for available formats and factory methods.It's not possible to implement a custom format.
-
useHadoopForLocalFiles
Specifies that Jet should use Apache Hadoop for files from the local filesystem. Otherwise, local files are read by Jet directly. One advantage of Hadoop is that it can provide better parallelization when the number of files is smaller than the total parallelism of the pipeline source.Default value is
false
.- Parameters:
useHadoop
- if Hadoop should be use for reading local filesystem
-
ignoreFileNotFound
Set to true to ignore no matching files in the directory specified bypath
.When there is no file matching the glob specified by
glob(String)
(or the default glob) Jet throws an exception by default. This might be problematic in some cases, where the directory is empty. To override this behaviour set this to true.If set to true and there are no files in the directory the source will produce 0 items.
Default value is
false
.- Parameters:
ignoreFileNotFound
- true if no files in the specified directory should be accepted
-
option
Specifies an arbitrary option for the underlying source. If you are looking for a missing option, check out theFileFormat
class you're using, it offers parsing-related options. -
build
Builds aBatchSource
based on the current state of the builder. -
buildMetaSupplier
Builds aProcessorMetaSupplier
based on the current state of the builder. Use for integration with the Core API.This method is a part of Core API and has lower backward-compatibility guarantees (we can change it in minor version).
-
hasHadoopPrefix
Checks if the given path starts with one of the defined Hadoop prefixes: "s3a://", // Amazon S3 "hdfs://", // HDFS "wasbs://", // Azure Cloud Storage "adl://", // Azure Data Lake Gen 1 "abfs://", // Azure Data Lake Gen 2 "gs://" // Google Cloud Storage seeHADOOP_PREFIXES
-