PyFlink

PyFlink is python entry point of Flink on Zeppelin, internally Flink Interpreter will create python shell which would create flink's environment variables (including ExecutionEnvironment, StreamExecutionEnvironment and so on). To be notice, the java environment behind pyflink is created in scala shell. That means underneath scala shell and python shell share the same environment. There're variables created in python shell.

  • s_env (StreamExecutionEnvironment),

  • b_env (ExecutionEnvironment)

  • st_env (StreamTableEnvironment for blink planner)

  • bt_env (BatchTableEnvironment for blink planner)

  • st_env_2 (StreamTableEnvironment for flink planner)

  • bt_env_2 (BatchTableEnvironment for flink planner)

  • z (ZeppelinContext)

There're 3 things you need to configure to make pyflink work in Zeppelin.

  • Install pyflink

    • e.g. ( pip install apache-flink==1.11.1 ).

    • If you need to use pyflink udf, then you to install pyflink on all the task manager nodes. That means if you are using yarn, then all the yarn nodes need to install pyflink.

  • Copy flink-python-*.jar under opt folder to flink lib folder.

  • Set zeppelin.pyflink.python as the python executable path.

    • By default, it is the python in PATH. In case you have multiple versions of python installed, you need to configure zeppelin.pyflink.python as the python version you want to use.

There're 2 ways to use PyFlink in Zeppelin

  • %flink.pyflink

  • %flink.ipyflink

%flink.pyflink is much simple and easy, you don't need to do anything except the above setting, but its function is also limited. I would suggest you to use %flink.ipyflink which provides almost the same user experience like jupyter.

Configuration

If you don't have anaconda installed, then you need to install the following 3 libraries.

pip install jupyter
pip install grpcio
pip install protobuf

If you have anaconda installed, then you only need to install 2 libraries.

pip install grpcio
pip install protobuf

Once you have made the above configuration, you can use the advanced features of %flink.ipyflink

Colorful output

IPython magic

Matplotlib Support

You can use matplotlib like in jupyter.

More Python visualization libraries support

Besides matplotlib, there're many other visualization libraries you can use in Zeppelin, such as bokeh, hvplot, pandas, seaborn and so on. Their usage is no difference as in jupyter, so just check their official document for how to use them in notebook.

Code Completion

You can type tab to get code completion.

Video Tutorial

Community

Join Zeppelin community to discuss with others

Last updated