Flink on Zeppelin Architecture
Last updated
Last updated
Understand Flink on Zeppelin's architecture would help you to know how to use flink on zeppelin more efficiently.
Basically, Zeppelin consists of 3 layers:
Frontend
Zeppelin Server
Interpreter
Frontend is responsible for the user interaction and communication with Zeppelin server via rest api or websocket.
Zeppelin server is a web server, which is responsible for managing notes, interpreters and so on. It won't run the user code in Zeppelin server, but just delegate them to interpreter process.
Interpreter process is responsible for executing user code (such as spark sql or flink sql). Interpreter process communicate with Zeppelin server via thrift rpc. Currently Zeppelin supports most of the popular big data engines (such as spark, flink, hive, presto, pig and so on). By default, the interpreter processes run in the same host of Zeppelin server (yarn-application of Flink interpreter would run Flink interpreter in JobManager aka. yarn AM)
Zeppelin Server is an isolated java process, its log is logs/zeppelin-{user}-{host}.log
, and each interpreter process is also an isolated java process, its log is logs/zeppelin-interpreter-{interpreter}-*.log
, so just check the log files when you hit any issues and don't have any clues based on the information in zeppelin frontend.
The above diagram is the architecture of Flink on Zeppelin. Flink interpreter on the left side is actually a Flink client which is responsible for compiling and manage Flink job lifecycle, such as submit, cancel job, monitoring job progress and so on). The Flink cluster on the right side is the place where execute Flink job. It could be a MiniCluster (local mode), Standalone cluster (remote mode), Yarn session cluster (yarn mode) , Yarn application session cluster (yarn-application mode)
There're 2 important components in Flink interpreter: Scala Shell & Python Shell
Scala shell is the entry point of Flink interpreter, it would create all the entry points of Flink program, such as ExecutionEnvironment,StreamExecutionEnvironment and TableEnvironment. Scala shell is responsible for compiling and running scala code and sql.
Python shell is the entry point of PyFlink, it is responsible for compile and run python code.
Join Zeppelin community to discuss with others