public final class FileSourceBuilder extends Object
mapOutputFn
Modifier and Type | Method and Description |
---|---|
BatchSource<String> |
build()
Deprecated.
Use
FileSources.files(java.lang.String) . Will be removed in Jet 5.0. |
<T> BatchSource<T> |
build(BiFunctionEx<String,String,? extends T> mapOutputFn)
Deprecated.
Use
FileSources.files(java.lang.String) . Will be removed in Jet 5.0. |
<T> BatchSource<T> |
build(FunctionEx<? super Path,? extends Stream<T>> readFileFn)
Deprecated.
Use
FileSources.files(java.lang.String) . Will be removed in Jet 5.0. |
StreamSource<String> |
buildWatcher()
Convenience for
buildWatcher(BiFunctionEx) . |
<T> StreamSource<T> |
buildWatcher(BiFunctionEx<String,String,? extends T> mapOutputFn)
Builds a source that emits a stream of lines of text coming from files in
the watched directory (but not its subdirectories).
|
FileSourceBuilder |
charset(Charset charset)
Sets the character set used to encode the files.
|
FileSourceBuilder |
glob(String glob)
Sets the globbing mask, see
getPathMatcher() . |
FileSourceBuilder |
sharedFileSystem(boolean sharedFileSystem)
Sets if files are in a shared storage visible to all members.
|
@Nonnull public FileSourceBuilder glob(@Nonnull String glob)
getPathMatcher()
.
Default value is "*"
which means all files.@Nonnull public FileSourceBuilder sharedFileSystem(boolean sharedFileSystem)
false
.
If sharedFileSystem
is true
, Jet will assume all members
see the same files. They will split the work so that each member will
read a part of the files. If sharedFileSystem
is false
,
each member will read all files in the directory, assuming the are
local.
If you start all the members on a single machine (such as for development), set this property to true. If you have multiple machines with multiple members each and the directory is not a shared storage, it's not possible to configure the file reader correctly - use only one member per machine.
@Nonnull public FileSourceBuilder charset(@Nonnull Charset charset)
StandardCharsets.UTF_8
.
Setting this component has no effect if the user provides a custom
readFileFn
to the build()
method.
@Nonnull public BatchSource<String> build()
FileSources.files(java.lang.String)
. Will be removed in Jet 5.0.build(BiFunctionEx)
.
Source emits lines to downstream without any transformation.@Nonnull public <T> BatchSource<T> build(@Nonnull BiFunctionEx<String,String,? extends T> mapOutputFn)
FileSources.files(java.lang.String)
. Will be removed in Jet 5.0.BatchSource
with supplied components and the
output function mapOutputFn
.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
Any IOException
will cause the job to fail. The files must not
change while being read; if they do, the behavior is unspecified.
The default local parallelism for this processor is 4 (or available CPU count if it is less than 4).
T
- the type of the items the source emitsmapOutputFn
- the function which creates output object from each
line. Gets the filename and line as parameters. It must be stateless and
cooperative.@Nonnull public <T> BatchSource<T> build(@Nonnull FunctionEx<? super Path,? extends Stream<T>> readFileFn)
FileSources.files(java.lang.String)
. Will be removed in Jet 5.0.BatchSource
with supplied components. Will
use the supplied readFileFn
to read the files. The configured
is ignored.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
Any IOException
will cause the job to fail. The files must not
change while being read; if they do, the behavior is unspecified.
The default local parallelism for this processor is 4 (or available CPU count if it is less than 4).
T
- the type of items returned from file readingreadFileFn
- the function to read objects from a file. Gets file
Path
as parameter and returns a Stream
of items. The function must be stateless.@Nonnull public StreamSource<String> buildWatcher()
buildWatcher(BiFunctionEx)
.@Nonnull public <T> StreamSource<T> buildWatcher(@Nonnull BiFunctionEx<String,String,? extends T> mapOutputFn)
If, during the scanning phase, the source observes a file that doesn't end with a newline, it will assume that there is a line just being written. This line won't appear in its output.
The source completes when the directory is deleted. However, in order
to delete the directory, all files in it must be deleted and if you
delete a file that is currently being read from, the job may encounter
an IOException
. The directory must be deleted on all nodes if
sharedFileSystem
is false
.
Any IOException
will cause the job to fail.
The source does not save any state to snapshot. If the job is restarted, lines added after the restart will be emitted, which gives at-most-once behavior.
The default local parallelism for this processor is 2 (or 1 if just 1 CPU is available).
WatchService
is not notified of appended lines
until the file is closed. If the file-writing process keeps the file
open while appending, the processor may fail to observe the changes.
It will be notified if any process tries to open that file, such as
looking at the file in Explorer. This holds for Windows 10 with the NTFS
file system and might change in future. You are advised to do your own
testing on your target Windows platform.
WatchService
) has a
history of unreliability and this source may experience infinite
blocking, missed, or duplicate events as a result. Such problems may be
resolved by upgrading the JRE to the latest version.
echo text >> yourFile
.T
- the type of the items the source emitsmapOutputFn
- the function which creates output object from each
line. Gets the filename and line as parameters. It must be stateless and
cooperative.Copyright © 2023 Hazelcast, Inc.. All rights reserved.