Class FileSourceBuilder
mapOutputFn
- Since:
- Jet 3.0
-
Method Summary
Modifier and TypeMethodDescriptionbuild()
Deprecated.<T> BatchSource<T>
build
(BiFunctionEx<String, String, ? extends T> mapOutputFn) Deprecated.<T> BatchSource<T>
build
(FunctionEx<? super Path, ? extends Stream<T>> readFileFn) Deprecated.Convenience forbuildWatcher(BiFunctionEx)
.<T> StreamSource<T>
buildWatcher
(BiFunctionEx<String, String, ? extends T> mapOutputFn) Builds a source that emits a stream of lines of text coming from files in the watched directory (but not its subdirectories).Sets the character set used to encode the files.Sets the globbing mask, seegetPathMatcher()
.sharedFileSystem
(boolean sharedFileSystem) Sets if files are in a shared storage visible to all members.
-
Method Details
-
glob
Sets the globbing mask, seegetPathMatcher()
. Default value is"*"
which means all files. -
charset
Sets the character set used to encode the files. Default value isStandardCharsets.UTF_8
.Setting this component has no effect if the user provides a custom
readFileFn
to thebuild()
method. -
build
Deprecated.UseFileSources.files(java.lang.String)
. Will be removed in Jet 5.0.Convenience forbuild(BiFunctionEx)
. Source emits lines to downstream without any transformation. -
build
@Deprecated @Nonnull public <T> BatchSource<T> build(@Nonnull BiFunctionEx<String, String, ? extends T> mapOutputFn) Deprecated.UseFileSources.files(java.lang.String)
. Will be removed in Jet 5.0.Builds a custom fileBatchSource
with supplied components and the output functionmapOutputFn
.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
Any
IOException
will cause the job to fail. The files must not change while being read; if they do, the behavior is unspecified.The default local parallelism for this processor is 4 (or available CPU count if it is less than 4).
- Type Parameters:
T
- the type of the items the source emits- Parameters:
mapOutputFn
- the function which creates output object from each line. Gets the filename and line as parameters. It must be stateless and cooperative.
-
build
@Deprecated @Nonnull public <T> BatchSource<T> build(@Nonnull FunctionEx<? super Path, ? extends Stream<T>> readFileFn) Deprecated.UseFileSources.files(java.lang.String)
. Will be removed in Jet 5.0.Builds a custom fileBatchSource
with supplied components. Will use the suppliedreadFileFn
to read the files. The configured is ignored.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
Any
IOException
will cause the job to fail. The files must not change while being read; if they do, the behavior is unspecified.The default local parallelism for this processor is 4 (or available CPU count if it is less than 4).
- Type Parameters:
T
- the type of items returned from file reading- Parameters:
readFileFn
- the function to read objects from a file. Gets filePath
as parameter and returns aStream
of items. The function must be stateless.
-
buildWatcher
Convenience forbuildWatcher(BiFunctionEx)
. -
buildWatcher
@Nonnull public <T> StreamSource<T> buildWatcher(@Nonnull BiFunctionEx<String, String, ? extends T> mapOutputFn) Builds a source that emits a stream of lines of text coming from files in the watched directory (but not its subdirectories). It will emit only new contents added after startup: both new files and new content appended to existing ones.If, during the scanning phase, the source observes a file that doesn't end with a newline, it will assume that there is a line just being written. This line won't appear in its output.
The source completes when the directory is deleted. However, in order to delete the directory, all files in it must be deleted and if you delete a file that is currently being read from, the job may encounter an
IOException
. The directory must be deleted on all nodes ifsharedFileSystem
isfalse
.Any
IOException
will cause the job to fail.The source does not save any state to snapshot. If the job is restarted, lines added after the restart will be emitted, which gives at-most-once behavior.
The default local parallelism for this processor is 2 (or 1 if just 1 CPU is available).
Limitation on Windows
On Windows theWatchService
is not notified of appended lines until the file is closed. If the file-writing process keeps the file open while appending, the processor may fail to observe the changes. It will be notified if any process tries to open that file, such as looking at the file in Explorer. This holds for Windows 10 with the NTFS file system and might change in future. You are advised to do your own testing on your target Windows platform.Use the latest JRE
The underlying JDK API (WatchService
) has a history of unreliability and this source may experience infinite blocking, missed, or duplicate events as a result. Such problems may be resolved by upgrading the JRE to the latest version.Appending lines using an text editor
If you're testing this source, you might think of using a text editor to append the lines. However, it might not work as expected because some editors write to a temp file and then rename it or append extra newline character at the end which gets overwritten if more text is added in the editor. The best way to append is to useecho text >> yourFile
.- Type Parameters:
T
- the type of the items the source emits- Parameters:
mapOutputFn
- the function which creates output object from each line. Gets the filename and line as parameters. It must be stateless and cooperative.
-
FileSources.files(java.lang.String)
.