PyFlink
Last updated
Was this helpful?
Last updated
Was this helpful?
Was this helpful?
PyFlink is python entry point of Flink on Zeppelin, internally Flink Interpreter will create python shell which would create flink's environment variables (including ExecutionEnvironment, StreamExecutionEnvironment and so on). To be notice, the java environment behind pyflink is created in scala shell. That means underneath scala shell and python shell share the same environment. There're variables created in python shell.
s_env
(StreamExecutionEnvironment),
b_env
(ExecutionEnvironment)
st_env
(StreamTableEnvironment for blink planner)
bt_env
(BatchTableEnvironment for blink planner)
st_env_2
(StreamTableEnvironment for flink planner)
bt_env_2
(BatchTableEnvironment for flink planner)
z
(ZeppelinContext)
There're 3 things you need to configure to make pyflink work in Zeppelin.
Install pyflink
e.g. ( pip install apache-flink==1.11.1
).
If you need to use pyflink udf, then you to install pyflink on all the task manager nodes. That means if you are using yarn, then all the yarn nodes need to install pyflink.
Copy flink-python-*.jar
under opt folder to flink lib folder.
Set zeppelin.pyflink.python
as the python executable path.
By default, it is the python in PATH
. In case you have multiple versions of python installed, you need to configure zeppelin.pyflink.python
as the python version you want to use.
There're 2 ways to use PyFlink in Zeppelin
%flink.pyflink
%flink.ipyflink
%flink.pyflink
is much simple and easy, you don't need to do anything except the above setting, but its function is also limited. I would suggest you to use %flink.ipyflink
which provides almost the same user experience like jupyter.
If you don't have anaconda installed, then you need to install the following 3 libraries.
pip install jupyter
pip install grpcio
pip install protobuf
If you have anaconda installed, then you only need to install 2 libraries.
pip install grpcio
pip install protobuf
Once you have made the above configuration, you can use the advanced features of %flink.ipyflink
You can use matplotlib like in jupyter.
Besides matplotlib, there're many other visualization libraries you can use in Zeppelin, such as bokeh, hvplot, pandas, seaborn and so on. Their usage is no difference as in jupyter, so just check their official document for how to use them in notebook.
You can type tab to get code completion.
Join Zeppelin community to discuss with others