Class PythonServiceConfig
- All Implemented Interfaces:
Serializable
mapUsingPython
stage.
Hazelcast Jet expects you to have a Python project in a local directory.
It must contain the definition of a transform_list()
function
that receives a list of strings and returns a list of strings of the
same size, with a one-to-one mapping between input and output elements.
Here's a simple example of a function that transforms every input
string by prepending "echo-"
to it:
def transform_list(input_list):
return ["echo-%s" % i for i in input_list]
If you have a very simple setup with everything in a single Python file,
you can use setHandlerFile(java.lang.String)
. Let's say you saved the above
Python code to a file named echo.py
. You can use it from Jet
like this:
StreamStage<String> inputStage = createInputStage();
StreamStage<String> outputStage = inputStage.apply(
mapUsingPython(new PythonServiceConfig()
.setHandlerFile("path/to/echo.py")));
In more complex setups you can tell Jet the location of your project
directory and the name of the Python module containing transform_list()
. You can
also use a different name for the
function.
Jet uploads the entire directory to the cluster, creates one or more Python processes on each member, and sends the pipeline data through your function. The number of processes is controlled by the local parallelism of the Python mapping stage.
Jet recognizes these special files in the base directory:
-
requirements.txt
is assumed to list the dependencies of your Python code. Jet will automatically install them to a job-local virtual environment. You can also install the modules to the Jet servers' global Python environment in order to speed up job initialization. Jet reuses the global modules and adds the missing ones. -
init.sh
is assumed to be a shell script that Jet will run when initializing the job. -
cleanup.sh
is assumed to be a shell script that Jet will run when completing the job.
To use this stage in a Hazelcast Jet cluster, Python must be installed
on every cluster member. Jet supports Python versions 3.5-3.7. If the
code has dependencies on non-standard Python modules, these must either
be pre-installed or the member machines must have access to the public
internet so that Jet can download and install them. A third option is
to write init.sh
that uses a different way of installing the
dependencies. In that case make sure not to use the standard filename
requirements.txt
, which Jet uses automatically.
The Python mapping stage produces log output at the FINE
level
under the com.hazelcast.jet.python
log category. This includes
all the output from launched subprocesses.
- Since:
- Jet 4.0
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionbaseDir()
Returns the Python base directory.BiFunctionEx<String,
Integer, ? extends io.grpc.ManagedChannelBuilder<?>> Returns the channel function, seesetChannelFn(com.hazelcast.function.BiFunctionEx<java.lang.String, java.lang.Integer, ? extends io.grpc.ManagedChannelBuilder<?>>)
.Returns the Python handler file.Returns the name of the handler function.Returns the handler module name.setBaseDir
(String baseDir) Sets the base directory where the Python files reside.setChannelFn
(BiFunctionEx<String, Integer, ? extends io.grpc.ManagedChannelBuilder<?>> channelFn) Sets the channel function.setHandlerFile
(String handlerFile) Sets the Python handler file.setHandlerFunction
(String handlerFunction) Overrides the default name of the Python function that transforms Jet pipeline data.setHandlerModule
(String handlerModule) Sets the name of the Python module that has the function that transforms Jet pipeline data.void
validate()
Validates the configuration and throws an exception if a mandatory config option is missing.
-
Constructor Details
-
PythonServiceConfig
public PythonServiceConfig()
-
-
Method Details
-
validate
public void validate()Validates the configuration and throws an exception if a mandatory config option is missing. Called automatically fromPythonTransforms.mapUsingPython(com.hazelcast.jet.python.PythonServiceConfig)
. -
baseDir
Returns the Python base directory. -
setBaseDir
Sets the base directory where the Python files reside. When you set this, also set the name of thehandler module
to identify the location of the handler function (namedtransform_list()
by convention).If all you need to deploy to Jet is in a single file, you can call
setHandlerFile(java.lang.String)
instead. -
handlerFile
Returns the Python handler file. -
setHandlerFile
Sets the Python handler file. It must contain the handler function. If your Python work is in more than one file, callsetBaseDir(java.lang.String)
instead. -
handlerModule
Returns the handler module name. -
setHandlerModule
Sets the name of the Python module that has the function that transforms Jet pipeline data. -
handlerFunction
Returns the name of the handler function. The default value istransform_list
. -
setHandlerFunction
Overrides the default name of the Python function that transforms Jet pipeline data. The default name is "transform_list". It must be defined in the module you configured withsetHandlerModule(java.lang.String)
, must take a single argument that is a list of strings, and return another list of strings which has the results of transforming each item in the input list. There must be a strict one-to-one match between the input and output lists. -
channelFn
Returns the channel function, seesetChannelFn(com.hazelcast.function.BiFunctionEx<java.lang.String, java.lang.Integer, ? extends io.grpc.ManagedChannelBuilder<?>>)
.- Since:
- 5.2
-
setChannelFn
@Nonnull public PythonServiceConfig setChannelFn(@Nonnull BiFunctionEx<String, Integer, ? extends io.grpc.ManagedChannelBuilder<?>> channelFn) Sets the channel function. The function receives a host+port tuple, and it's supposed to return a configured instance ofManagedChannelBuilder
. You can use this to configure the channel, for example to configure the maximum message size etc.The default value is
NettyChannelBuilder::forAddress
.- Since:
- 5.2
-