Class S3Sources
-
Method Summary
Modifier and TypeMethodDescriptionstatic BatchSource<String>
s3
(List<String> bucketNames, String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier) Convenience fors3(List, String, Charset, SupplierEx, BiFunctionEx)
.static <I,
T> BatchSource<T> s3
(List<String> bucketNames, String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, FunctionEx<? super InputStream, ? extends Stream<I>> readFileFn, BiFunctionEx<String, ? super I, ? extends T> mapFn) Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.static <I,
T> BatchSource<T> s3
(List<String> bucketNames, String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, TriFunction<? super InputStream, String, String, ? extends Stream<I>> readFileFn, BiFunctionEx<String, ? super I, ? extends T> mapFn) Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.static <T> BatchSource<T>
s3
(List<String> bucketNames, String prefix, Charset charset, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, BiFunctionEx<String, String, ? extends T> mapFn) Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them line by line, transforms each line to the desired output object using givenmapFn
and emits them to downstream.
-
Method Details
-
s3
@Nonnull public static BatchSource<String> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier) Convenience fors3(List, String, Charset, SupplierEx, BiFunctionEx)
. Emits lines to downstream without any transformation and usesStandardCharsets.UTF_8
. -
s3
@Nonnull public static <T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull Charset charset, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull BiFunctionEx<String, String, ? extends T> mapFn) Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them line by line, transforms each line to the desired output object using givenmapFn
and emits them to downstream.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", StandardCharsets.UTF_8, () -> S3Client.create(), (filename, line) -> line ));
- Type Parameters:
T
- the type of the items the source emits- Parameters:
bucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passingnull
will list all objects.clientSupplier
- function which returns the s3 client to use one client per processor instance is usedmapFn
- the function which creates output object from each line. Gets the object name and line as parameters
-
s3
@Nonnull public static <I,T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull FunctionEx<? super InputStream, ? extends Stream<I>> readFileFn, @Nonnull BiFunctionEx<String, ? super I, ? extends T> mapFn) Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", () -> S3Client.create(), (inputStream) -> new LineIterator(new InputStreamReader(inputStream)), (filename, line) -> line ));
- Type Parameters:
T
- the type of the items the source emits- Parameters:
bucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passingnull
will list all objects.clientSupplier
- function which returns the s3 client to use one client per processor instance is usedreadFileFn
- the function which creates iterator, which reads the file in lazy waymapFn
- the function which creates output object from each line. Gets the object name and line as parameters
-
s3
@Nonnull public static <I,T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull TriFunction<? super InputStream, String, String, ? extends Stream<I>> readFileFn, @Nonnull BiFunctionEx<String, ? super I, ? extends T> mapFn) Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", () -> S3Client.create(), (inputStream, key, bucketName) -> new LineIterator(new InputStreamReader(inputStream)), (filename, line) -> line ));
- Type Parameters:
T
- the type of the items the source emits- Parameters:
bucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passingnull
will list all objects.clientSupplier
- function which returns the s3 client to use one client per processor instance is usedreadFileFn
- the function which creates iterator, which reads the file in lazy waymapFn
- the function which creates output object from each line. Gets the object name and line as parameters- Since:
- Jet 4.3
-