public final class FileSourceBuilder extends Object
mapOutputFn
Modifier and Type | Method and Description |
---|---|
BatchSource<String> |
build()
Convenience for
build(DistributedBiFunction) . |
<T> BatchSource<T> |
build(DistributedBiFunction<String,String,? extends T> mapOutputFn)
Builds a custom file
BatchSource with supplied components and the
output function mapOutputFn . |
StreamSource<String> |
buildWatcher()
Convenience for
buildWatcher(DistributedBiFunction) . |
<T> StreamSource<T> |
buildWatcher(DistributedBiFunction<String,String,? extends T> mapOutputFn)
Builds a source that emits a stream of lines of text coming from files in
the watched directory (but not its subdirectories).
|
FileSourceBuilder |
charset(Charset charset)
Sets the character set used to encode the files.
|
FileSourceBuilder |
glob(String glob)
Sets the globbing mask, see
getPathMatcher() . |
FileSourceBuilder |
sharedFileSystem(boolean sharedFileSystem)
Sets if files are in a shared storage visible to all members.
|
public FileSourceBuilder glob(@Nonnull String glob)
getPathMatcher()
.
Default value is "*"
which means all files.public FileSourceBuilder sharedFileSystem(boolean sharedFileSystem)
false
If sharedFileSystem
is true
, Jet will assume all members
see the same files. They will split the work so that each member will
read a part of the files. If sharedFileSystem
is false
,
each member will read all files in the directory, assuming the are
local.
public FileSourceBuilder charset(@Nonnull Charset charset)
StandardCharsets.UTF_8
.
Setting this component does not have any effect if builder is used by Avro module.
public BatchSource<String> build()
build(DistributedBiFunction)
.
Source emits lines to downstream without any transformation.public <T> BatchSource<T> build(DistributedBiFunction<String,String,? extends T> mapOutputFn)
BatchSource
with supplied components and the
output function mapOutputFn
.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
Any IOException
will cause the job to fail. The files must not
change while being read; if they do, the behavior is unspecified.
The default local parallelism for this processor is 2 (or 1 if just 1 CPU is available).
T
- the type of the items the source emitsmapOutputFn
- the function which creates output object from each
line. Gets the filename and line as parameterspublic StreamSource<String> buildWatcher()
buildWatcher(DistributedBiFunction)
.public <T> StreamSource<T> buildWatcher(DistributedBiFunction<String,String,? extends T> mapOutputFn)
If, during the scanning phase, the source observes a file that doesn't end with a newline, it will assume that there is a line just being written. This line won't appear in its output.
The source completes when the directory is deleted. However, in order
to delete the directory, all files in it must be deleted and if you
delete a file that is currently being read from, the job may encounter
an IOException
. The directory must be deleted on all nodes if
sharedFileSystem
is false
.
Any IOException
will cause the job to fail.
The source does not save any state to snapshot. If the job is restarted, lines added after the restart will be emitted, which gives at-most-once behavior.
The default local parallelism for this processor is 2 (or 1 if just 1 CPU is available).
WatchService
is not notified of appended lines
until the file is closed. If the file-writing process keeps the file
open while appending, the processor may fail to observe the changes.
It will be notified if any process tries to open that file, such as
looking at the file in Explorer. This holds for Windows 10 with the NTFS
file system and might change in future. You are advised to do your own
testing on your target Windows platform.
WatchService
) has a
history of unreliability and this source may experience infinite
blocking, missed, or duplicate events as a result. Such problems may be
resolved by upgrading the JRE to the latest version.T
- the type of the items the source emitsmapOutputFn
- the function which creates output object from each
line. Gets the filename and line as parametersCopyright © 2018 Hazelcast, Inc.. All rights reserved.