java.lang.Object

com.hazelcast.jet.s3.S3Sources

public final class S3Sources extends Object

Contains factory methods for creating AWS S3 sources.

Method Summary

Modifier and Type

Method

Description

static BatchSource<String>

s3(List<String> bucketNames, String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)

Convenience for s3(List, String, Charset, SupplierEx, BiFunctionEx).

static <I, T> BatchSource<T>

s3(List<String> bucketNames, String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, FunctionEx<? super InputStream,? extends Stream<I>> readFileFn, BiFunctionEx<String,? super I,? extends T> mapFn)

Creates an AWS S3 BatchSource which lists all the objects in the bucket-list using given prefix, reads them using provided readFileFn, transforms each read item to the desired output object using given mapFn and emits them to downstream.

static <I, T> BatchSource<T>

s3(List<String> bucketNames, String prefix, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, TriFunction<? super InputStream,String,String,? extends Stream<I>> readFileFn, BiFunctionEx<String,? super I,? extends T> mapFn)

Creates an AWS S3 BatchSource which lists all the objects in the bucket-list using given prefix, reads them using provided readFileFn, transforms each read item to the desired output object using given mapFn and emits them to downstream.

static <T> BatchSource<T>

s3(List<String> bucketNames, String prefix, Charset charset, SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, BiFunctionEx<String,String,? extends T> mapFn)

Creates an AWS S3 BatchSource which lists all the objects in the bucket-list using given prefix, reads them line by line, transforms each line to the desired output object using given mapFn and emits them to downstream.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- s3
  
  @Nonnull public static BatchSource<String> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
  
  Convenience for s3(List, String, Charset, SupplierEx, BiFunctionEx). Emits lines to downstream without any transformation and uses StandardCharsets.UTF_8.
- s3
  
  @Nonnull public static <T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull Charset charset, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull BiFunctionEx<String,String,? extends T> mapFn)
  Creates an AWS S3 BatchSource which lists all the objects in the bucket-list using given prefix, reads them line by line, transforms each line to the desired output object using given mapFn and emits them to downstream.
  The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
  The default local parallelism for this processor is 2.
  Here is an example which reads the objects from a single bucket with applying the given prefix.
  Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", StandardCharsets.UTF_8, () -> S3Client.create(), (filename, line) -> line ));
  Type Parameters:
  
  T - the type of the items the source emits
  
  Parameters:
  
  bucketNames - list of bucket-names
  
  prefix - the prefix to filter the objects. Optional, passing null will list all objects.
  
  clientSupplier - function which returns the s3 client to use one client per processor instance is used
  
  mapFn - the function which creates output object from each line. Gets the object name and line as parameters
- s3
  
  @Nonnull public static <I, T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull FunctionEx<? super InputStream,? extends Stream<I>> readFileFn, @Nonnull BiFunctionEx<String,? super I,? extends T> mapFn)
  Creates an AWS S3 BatchSource which lists all the objects in the bucket-list using given prefix, reads them using provided readFileFn, transforms each read item to the desired output object using given mapFn and emits them to downstream.
  The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
  The default local parallelism for this processor is 2.
  Here is an example which reads the objects from a single bucket with applying the given prefix.
  Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", () -> S3Client.create(), (inputStream) -> new LineIterator(new InputStreamReader(inputStream)), (filename, line) -> line ));
  Type Parameters:
  
  T - the type of the items the source emits
  
  Parameters:
  
  bucketNames - list of bucket-names
  
  prefix - the prefix to filter the objects. Optional, passing null will list all objects.
  
  clientSupplier - function which returns the s3 client to use one client per processor instance is used
  
  readFileFn - the function which creates iterator, which reads the file in lazy way
  
  mapFn - the function which creates output object from each line. Gets the object name and line as parameters
- s3
  
  @Nonnull public static <I, T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull TriFunction<? super InputStream,String,String,? extends Stream<I>> readFileFn, @Nonnull BiFunctionEx<String,? super I,? extends T> mapFn)
  Creates an AWS S3 BatchSource which lists all the objects in the bucket-list using given prefix, reads them using provided readFileFn, transforms each read item to the desired output object using given mapFn and emits them to downstream.
  The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
  The default local parallelism for this processor is 2.
  Here is an example which reads the objects from a single bucket with applying the given prefix.
  Pipeline p = Pipeline.create(); BatchStage<String> srcStage = p.readFrom(S3Sources.s3( Arrays.asList("bucket1", "bucket2"), "prefix", () -> S3Client.create(), (inputStream, key, bucketName) -> new LineIterator(new InputStreamReader(inputStream)), (filename, line) -> line ));
  Type Parameters:
  
  T - the type of the items the source emits
  
  Parameters:
  
  bucketNames - list of bucket-names
  
  prefix - the prefix to filter the objects. Optional, passing null will list all objects.
  
  clientSupplier - function which returns the s3 client to use one client per processor instance is used
  
  readFileFn - the function which creates iterator, which reads the file in lazy way
  
  mapFn - the function which creates output object from each line. Gets the object name and line as parameters
  
  Since:
  
  Jet 4.3

Class S3Sources

Method Summary

Methods inherited from class java.lang.Object

Method Details

s3

s3

s3

s3