T
- the type of items a source using this file format will emitpublic interface FileFormat<T> extends Serializable
Modifier and Type | Method and Description |
---|---|
static <T> AvroFileFormat<T> |
avro()
Returns a file format for Avro files.
|
static <T> AvroFileFormat<T> |
avro(Class<T> clazz)
Returns a file format for Avro files that specifies to use reflection
to deserialize the data into instances of the provided Java class.
|
static RawBytesFileFormat |
bytes()
Returns a file format for binary files.
|
static <T> CsvFileFormat<T> |
csv(Class<T> clazz)
Returns a file format for CSV files which specifies to deserialize each
line into an instance of the given class.
|
static CsvFileFormat<String[]> |
csv(List<String> fieldNames)
Returns a file format for CSV files which specifies to deserialize each
line into
String[] . |
String |
format()
Returns the name of the file format.
|
static <T> JsonFileFormat<T> |
json()
Returns a file format for JSON Lines files.
|
static <T> JsonFileFormat<T> |
json(Class<T> clazz)
Returns a file format for JSON Lines files, where each line of text
is one JSON object.
|
static LinesTextFileFormat |
lines()
Returns a file format for text files where each line is a
String
data item. |
static LinesTextFileFormat |
lines(Charset charset)
Returns a file format for text files where each line is a
String
data item. |
static <T> ParquetFileFormat<T> |
parquet()
Returns a file format for Parquet files.
|
static TextFileFormat |
text()
Returns a file format for text files where the whole file is a single
string item.
|
static TextFileFormat |
text(Charset charset)
Returns a file format for text files where the whole file is a single
string item.
|
@Nonnull String format()
@Nonnull static <T> AvroFileFormat<T> avro()
@Nonnull static <T> AvroFileFormat<T> avro(@Nullable Class<T> clazz)
ReflectDatumReader
to read Avro data. The
parameter may be null
, disabling the option to deserialize
using reflection, but for that case you may prefer the no-argument
avro()
call.@Nonnull static CsvFileFormat<String[]> csv(@Nullable List<String> fieldNames)
String[]
. It assumes the CSV has a header line and
specifies to use it as the column names that map to the object's fields.
fieldNames
specify which column should be at which index in the
resulting string array. It is useful if the files have different field
order or don't have the same set of columns.
For example, if the argument is [surname, name]
, then the format
will always return items of type String[2] where at index 0 is the
surname
column and at index 1 is the name
column,
regardless of the actual columns found in a particular file. If some
file doesn't have some field, the value at its index will always be 0.
If the given list is null
, the length and order of the string
array will match the order found in each file. It can be different for
each file. If it's an empty array, a zero-length array will be returned.
@Nonnull static <T> CsvFileFormat<T> csv(@Nonnull Class<T> clazz)
@Nonnull static <T> JsonFileFormat<T> json()
@Nonnull static <T> JsonFileFormat<T> json(@Nullable Class<T> clazz)
null
, data is deserialized into Map<String, Object>
but for that case you may prefer the no-argument
json()
call.@Nonnull static LinesTextFileFormat lines()
String
data item. It uses the UTF-8 character encoding.@Nonnull static LinesTextFileFormat lines(@Nonnull Charset charset)
String
data item. This variant allows you to choose the character encoding.
Note that the Hadoop-based file connector only accepts UTF-8.charset
- character encoding of the file@Nonnull static <T> ParquetFileFormat<T> parquet()
NOTE: this format is supported only through the Hadoop connector.
@Nonnull static RawBytesFileFormat bytes()
@Nonnull static TextFileFormat text()
@Nonnull static TextFileFormat text(@Nonnull Charset charset)
NOTE: the Hadoop connector only supports UTF-8. This option is supported for local files only.
charset
- character encoding of the fileCopyright © 2024 Hazelcast, Inc.. All rights reserved.