Class S3Sources
- java.lang.Object
-
- com.hazelcast.jet.s3.S3Sources
-
public final class S3Sources extends java.lang.Object
Contains factory methods for creating AWS S3 sources.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static BatchSource<java.lang.String>
s3(java.util.List<java.lang.String> bucketNames, java.lang.String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
Convenience fors3(List, String, Charset, SupplierEx, BiFunctionEx)
.static <I,T>
BatchSource<T>s3(java.util.List<java.lang.String> bucketNames, java.lang.String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, FunctionEx<? super java.io.InputStream,? extends java.util.stream.Stream<I>> readFileFn, BiFunctionEx<java.lang.String,? super I,? extends T> mapFn)
Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.static <I,T>
BatchSource<T>s3(java.util.List<java.lang.String> bucketNames, java.lang.String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, TriFunction<? super java.io.InputStream,java.lang.String,java.lang.String,? extends java.util.stream.Stream<I>> readFileFn, BiFunctionEx<java.lang.String,? super I,? extends T> mapFn)
Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.static <T> BatchSource<T>
s3(java.util.List<java.lang.String> bucketNames, java.lang.String prefix, java.nio.charset.Charset charset, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, BiFunctionEx<java.lang.String,java.lang.String,? extends T> mapFn)
Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them line by line, transforms each line to the desired output object using givenmapFn
and emits them to downstream.
-
-
-
Method Detail
-
s3
@Nonnull public static BatchSource<java.lang.String> s3(@Nonnull java.util.List<java.lang.String> bucketNames, @Nullable java.lang.String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
Convenience fors3(List, String, Charset, SupplierEx, BiFunctionEx)
. Emits lines to downstream without any transformation and usesStandardCharsets.UTF_8
.
-
s3
@Nonnull public static <T> BatchSource<T> s3(@Nonnull java.util.List<java.lang.String> bucketNames, @Nullable java.lang.String prefix, @Nonnull java.nio.charset.Charset charset, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull BiFunctionEx<java.lang.String,java.lang.String,? extends T> mapFn)
Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them line by line, transforms each line to the desired output object using givenmapFn
and emits them to downstream.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", StandardCharsets.UTF_8, () -> S3Client.create(), (filename, line) -> line ));
- Type Parameters:
T
- the type of the items the source emits- Parameters:
bucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passingnull
will list all objects.clientSupplier
- function which returns the s3 client to use one client per processor instance is usedmapFn
- the function which creates output object from each line. Gets the object name and line as parameters
-
s3
@Nonnull public static <I,T> BatchSource<T> s3(@Nonnull java.util.List<java.lang.String> bucketNames, @Nullable java.lang.String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull FunctionEx<? super java.io.InputStream,? extends java.util.stream.Stream<I>> readFileFn, @Nonnull BiFunctionEx<java.lang.String,? super I,? extends T> mapFn)
Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", () -> S3Client.create(), (inputStream) -> new LineIterator(new InputStreamReader(inputStream)), (filename, line) -> line ));
- Type Parameters:
T
- the type of the items the source emits- Parameters:
bucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passingnull
will list all objects.clientSupplier
- function which returns the s3 client to use one client per processor instance is usedreadFileFn
- the function which creates iterator, which reads the file in lazy waymapFn
- the function which creates output object from each line. Gets the object name and line as parameters
-
s3
@Nonnull public static <I,T> BatchSource<T> s3(@Nonnull java.util.List<java.lang.String> bucketNames, @Nullable java.lang.String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull TriFunction<? super java.io.InputStream,java.lang.String,java.lang.String,? extends java.util.stream.Stream<I>> readFileFn, @Nonnull BiFunctionEx<java.lang.String,? super I,? extends T> mapFn)
Creates an AWS S3BatchSource
which lists all the objects in the bucket-list using givenprefix
, reads them using providedreadFileFn
, transforms each read item to the desired output object using givenmapFn
and emits them to downstream.The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", () -> S3Client.create(), (inputStream, key, bucketName) -> new LineIterator(new InputStreamReader(inputStream)), (filename, line) -> line ));
- Type Parameters:
T
- the type of the items the source emits- Parameters:
bucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passingnull
will list all objects.clientSupplier
- function which returns the s3 client to use one client per processor instance is usedreadFileFn
- the function which creates iterator, which reads the file in lazy waymapFn
- the function which creates output object from each line. Gets the object name and line as parameters- Since:
- Jet 4.3
-
-