public final class S3Sources extends Object
Modifier and Type | Method and Description |
---|---|
static <T> BatchSource<T> |
s3(List<String> bucketNames,
String prefix,
Charset charset,
SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier,
BiFunctionEx<String,String,? extends T> mapFn)
Creates an AWS S3
BatchSource which lists all the objects in the
bucket-list using given prefix , reads them line by line,
transforms each line to the desired output object using given mapFn and emits them to downstream. |
static BatchSource<String> |
s3(List<String> bucketNames,
String prefix,
SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
Convenience for
s3(List, String, Charset, SupplierEx, BiFunctionEx) . |
static <I,T> BatchSource<T> |
s3(List<String> bucketNames,
String prefix,
SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier,
FunctionEx<? super InputStream,? extends Stream<I>> readFileFn,
BiFunctionEx<String,? super I,? extends T> mapFn)
Creates an AWS S3
BatchSource which lists all the objects in the
bucket-list using given prefix , reads them using provided readFileFn , transforms each read item to the desired output object
using given mapFn and emits them to downstream. |
static <I,T> BatchSource<T> |
s3(List<String> bucketNames,
String prefix,
SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier,
TriFunction<? super InputStream,String,String,? extends Stream<I>> readFileFn,
BiFunctionEx<String,? super I,? extends T> mapFn)
Creates an AWS S3
BatchSource which lists all the objects in the
bucket-list using given prefix , reads them using provided readFileFn , transforms each read item to the desired output object
using given mapFn and emits them to downstream. |
@Nonnull public static BatchSource<String> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
s3(List, String, Charset, SupplierEx, BiFunctionEx)
.
Emits lines to downstream without any transformation and uses StandardCharsets.UTF_8
.@Nonnull public static <T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull Charset charset, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull BiFunctionEx<String,String,? extends T> mapFn)
BatchSource
which lists all the objects in the
bucket-list using given prefix
, reads them line by line,
transforms each line to the desired output object using given mapFn
and emits them to downstream.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create();
BatchStage<String> srcStage = p.readFrom(S3Sources.s3(
Arrays.asList("bucket1", "bucket2"),
"prefix",
StandardCharsets.UTF_8,
() -> S3Client.create(),
(filename, line) -> line
));
T
- the type of the items the source emitsbucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passing
null
will list all objects.clientSupplier
- function which returns the s3 client to use
one client per processor instance is usedmapFn
- the function which creates output object from each
line. Gets the object name and line as parameters@Nonnull public static <I,T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull FunctionEx<? super InputStream,? extends Stream<I>> readFileFn, @Nonnull BiFunctionEx<String,? super I,? extends T> mapFn)
BatchSource
which lists all the objects in the
bucket-list using given prefix
, reads them using provided readFileFn
, transforms each read item to the desired output object
using given mapFn
and emits them to downstream.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create();
BatchStage<String> srcStage = p.readFrom(S3Sources.s3(
Arrays.asList("bucket1", "bucket2"),
"prefix",
() -> S3Client.create(),
(inputStream) -> new LineIterator(new InputStreamReader(inputStream)),
(filename, line) -> line
));
T
- the type of the items the source emitsbucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passing
null
will list all objects.clientSupplier
- function which returns the s3 client to use
one client per processor instance is usedreadFileFn
- the function which creates iterator, which reads
the file in lazy waymapFn
- the function which creates output object from each
line. Gets the object name and line as parameters@Nonnull public static <I,T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull TriFunction<? super InputStream,String,String,? extends Stream<I>> readFileFn, @Nonnull BiFunctionEx<String,? super I,? extends T> mapFn)
BatchSource
which lists all the objects in the
bucket-list using given prefix
, reads them using provided readFileFn
, transforms each read item to the desired output object
using given mapFn
and emits them to downstream.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create();
BatchStage<String> srcStage = p.readFrom(S3Sources.s3(
Arrays.asList("bucket1", "bucket2"),
"prefix",
() -> S3Client.create(),
(inputStream, key, bucketName) -> new LineIterator(new InputStreamReader(inputStream)),
(filename, line) -> line
));
T
- the type of the items the source emitsbucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passing
null
will list all objects.clientSupplier
- function which returns the s3 client to use
one client per processor instance is usedreadFileFn
- the function which creates iterator, which reads
the file in lazy waymapFn
- the function which creates output object from each
line. Gets the object name and line as parametersCopyright © 2023 Hazelcast, Inc.. All rights reserved.