Class FileSources


  • public final class FileSources
    extends java.lang.Object
    Contains factory methods for the Unified File Connector.
    Since:
    Jet 4.4
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static FileSourceBuilder<java.lang.String> files​(java.lang.String path)
      The main entry point to the Unified File Connector.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • files

        public static FileSourceBuilder<java.lang.String> files​(java.lang.String path)
        The main entry point to the Unified File Connector.

        Returns a FileSourceBuilder configured with default values, see its documentation for more options.

        The path specifies the filesystem type (for example s3a://, hdfs://) and the path to the files. If it doesn't specify a file system, a local file system is used - in this case the path must be absolute. By "local" we mean local to each Jet cluster member, not to the client submitting the job.

        The following file systems are supported:

        • s3a:// (Amazon S3)
        • hdfs:// (HDFS)
        • wasbs:// (Azure Cloud Storage)
        • adl:// (Azure Data Lake Gen 1)
        • abfs:// (Azure Data Lake Gen 2)
        • gs:// (Google Cloud Storage)

        The path must point to a directory. All files in the directory are processed. Subdirectories are not processed recursively. The path must not contain any wildcard characters.

        Example usage:

        
         Pipeline p = Pipeline.create();
                 p.readFrom(FileSources.files("/path/to/directory").build())
                  .map(line -> LogParser.parse(line))
                  .filter(log -> log.level().equals("ERROR"))
                  .writeTo(Sinks.logger());
         
        Parameters:
        path - the path to the directory
        Returns:
        the builder object with fluent API