Interface FileFormat<T>
- Type Parameters:
T
- the type of items a source using this file format will emit
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
AvroFileFormat
,CsvFileFormat
,JsonFileFormat
,LinesTextFileFormat
,ParquetFileFormat
,RawBytesFileFormat
,TextFileFormat
- Since:
- Jet 4.4
-
Method Summary
Modifier and TypeMethodDescriptionstatic <T> AvroFileFormat<T>
avro()
Returns a file format for Avro files.static <T> AvroFileFormat<T>
Returns a file format for Avro files that specifies to use reflection to deserialize the data into instances of the provided Java class.static RawBytesFileFormat
bytes()
Returns a file format for binary files.static <T> CsvFileFormat<T>
Returns a file format for CSV files which specifies to deserialize each line into an instance of the given class.static CsvFileFormat<String[]>
Returns a file format for CSV files which specifies to deserialize each line intoString[]
.format()
Returns the name of the file format.static <T> JsonFileFormat<T>
json()
Returns a file format for JSON Lines files.static <T> JsonFileFormat<T>
Returns a file format for JSON Lines files, where each line of text is one JSON object.static LinesTextFileFormat
lines()
Returns a file format for text files where each line is aString
data item.static LinesTextFileFormat
Returns a file format for text files where each line is aString
data item.static <T> ParquetFileFormat<T>
parquet()
Returns a file format for Parquet files.static TextFileFormat
text()
Returns a file format for text files where the whole file is a single string item.static TextFileFormat
Returns a file format for text files where the whole file is a single string item.
-
Method Details
-
format
Returns the name of the file format. The convention is to use the well-known filename suffix or, if there is none, a short-form name of the format. -
avro
Returns a file format for Avro files. -
avro
Returns a file format for Avro files that specifies to use reflection to deserialize the data into instances of the provided Java class. Jet will use theReflectDatumReader
to read Avro data. The parameter may benull
, disabling the option to deserialize using reflection, but for that case you may prefer the no-argumentavro()
call. -
csv
Returns a file format for CSV files which specifies to deserialize each line intoString[]
. It assumes the CSV has a header line and specifies to use it as the column names that map to the object's fields.fieldNames
specify which column should be at which index in the resulting string array. It is useful if the files have different field order or don't have the same set of columns.For example, if the argument is
[surname, name]
, then the format will always return items of type String[2] where at index 0 is thesurname
column and at index 1 is thename
column, regardless of the actual columns found in a particular file. If some file doesn't have some field, the value at its index will always be 0.If the given list is
null
, the length and order of the string array will match the order found in each file. It can be different for each file. If it's an empty array, a zero-length array will be returned. -
csv
Returns a file format for CSV files which specifies to deserialize each line into an instance of the given class. It assumes the CSV has a header line and specifies to use it as the column names that map to the object's fields. -
json
Returns a file format for JSON Lines files. -
json
Returns a file format for JSON Lines files, where each line of text is one JSON object. It specifies to deserialize the JSON data into instances of the provided class. It uses Jackson jr, which supports the basic data types such as strings, numbers, lists and maps, objects with JavaBeans-style getters/setters, as well as public fields. If parameter isnull
, data is deserialized intoMap<String, Object>
but for that case you may prefer the no-argumentjson()
call. -
lines
Returns a file format for text files where each line is aString
data item. It uses the UTF-8 character encoding. -
lines
Returns a file format for text files where each line is aString
data item. This variant allows you to choose the character encoding. Note that the Hadoop-based file connector only accepts UTF-8.- Parameters:
charset
- character encoding of the file
-
parquet
Returns a file format for Parquet files.NOTE: this format is supported only through the Hadoop connector.
-
bytes
Returns a file format for binary files. -
text
Returns a file format for text files where the whole file is a single string item. It uses the UTF-8 character encoding. -
text
Returns a file format for text files where the whole file is a single string item. This variant allows you to choose the character encoding.NOTE: the Hadoop connector only supports UTF-8. This option is supported for local files only.
- Parameters:
charset
- character encoding of the file
-