PyFlink
PyFlink is python entry point of Flink on Zeppelin, internally Flink Interpreter will create python shell which would create flink's environment variables (including ExecutionEnvironment, StreamExecutionEnvironment and so on). To be notice, the java environment behind pyflink is created in scala shell. That means underneath scala shell and python shell share the same environment. There're variables created in python shell.
s_env
(StreamExecutionEnvironment),b_env
(ExecutionEnvironment)st_env
(StreamTableEnvironment for blink planner)bt_env
(BatchTableEnvironment for blink planner)st_env_2
(StreamTableEnvironment for flink planner)bt_env_2
(BatchTableEnvironment for flink planner)z
(ZeppelinContext)
Configure PyFlink
There're 3 things you need to configure to make pyflink work in Zeppelin.
Install pyflink
e.g. (
pip install apache-flink==1.11.1
).If you need to use pyflink udf, then you to install pyflink on all the task manager nodes. That means if you are using yarn, then all the yarn nodes need to install pyflink.
Copy
flink-python-*.jar
under opt folder to flink lib folder.Set
zeppelin.pyflink.python
as the python executable path.By default, it is the python in
PATH
. In case you have multiple versions of python installed, you need to configurezeppelin.pyflink.python
as the python version you want to use.
How to use PyFlink
There're 2 ways to use PyFlink in Zeppelin
%flink.pyflink
%flink.ipyflink
%flink.pyflink
is much simple and easy, you don't need to do anything except the above setting, but its function is also limited. I would suggest you to use %flink.ipyflink
which provides almost the same user experience like jupyter.
How to use IPyFlink
Configuration
If you don't have anaconda installed, then you need to install the following 3 libraries.
If you have anaconda installed, then you only need to install 2 libraries.
Features of IPyFlink
Once you have made the above configuration, you can use the advanced features of %flink.ipyflink
Colorful output
IPython magic
Matplotlib Support
You can use matplotlib like in jupyter.
More Python visualization libraries support
Besides matplotlib, there're many other visualization libraries you can use in Zeppelin, such as bokeh, hvplot, pandas, seaborn and so on. Their usage is no difference as in jupyter, so just check their official document for how to use them in notebook.
Code Completion
You can type tab to get code completion.
Video Tutorial
Community
Join Zeppelin community to discuss with others
Last updated