📋
Flink on Zeppelin
  • Setup Zeppelin
  • Main Features
  • Flink on Zeppelin Architecture
  • Execution mode
    • local mode
    • remote mode
    • yarn mode
    • yarn-application mode
  • Languages
    • Scala
    • PyFlink
    • SQL
  • Hive Integration
  • Interpreter binding mode
  • Configure Flink Interpreter
  • Third party dependencies
  • UDF
  • Checkpoint & Savepoint
  • Recovery
  • Flink Sql Cookbook
  • FAQ
Powered by GitBook
On this page
  • Zeppelin Architecture
  • Flink on Zeppelin Architecture
  • Community

Was this helpful?

Flink on Zeppelin Architecture

PreviousMain FeaturesNextExecution mode

Last updated 3 years ago

Was this helpful?

Understand Flink on Zeppelin's architecture would help you to know how to use flink on zeppelin more efficiently.

Zeppelin Architecture

Basically, Zeppelin consists of 3 layers:

  • Frontend

  • Zeppelin Server

  • Interpreter

  • Frontend is responsible for the user interaction and communication with Zeppelin server via rest api or websocket.

  • Zeppelin server is a web server, which is responsible for managing notes, interpreters and so on. It won't run the user code in Zeppelin server, but just delegate them to interpreter process.

  • Interpreter process is responsible for executing user code (such as spark sql or flink sql). Interpreter process communicate with Zeppelin server via thrift rpc. Currently Zeppelin supports most of the popular big data engines (such as spark, flink, hive, presto, pig and so on). By default, the interpreter processes run in the same host of Zeppelin server (yarn-application of Flink interpreter would run Flink interpreter in JobManager aka. yarn AM)

Zeppelin Server is an isolated java process, its log is logs/zeppelin-{user}-{host}.log, and each interpreter process is also an isolated java process, its log is logs/zeppelin-interpreter-{interpreter}-*.log, so just check the log files when you hit any issues and don't have any clues based on the information in zeppelin frontend.

Flink on Zeppelin Architecture

The above diagram is the architecture of Flink on Zeppelin. Flink interpreter on the left side is actually a Flink client which is responsible for compiling and manage Flink job lifecycle, such as submit, cancel job, monitoring job progress and so on). The Flink cluster on the right side is the place where execute Flink job. It could be a MiniCluster (local mode), Standalone cluster (remote mode), Yarn session cluster (yarn mode) , Yarn application session cluster (yarn-application mode)

There're 2 important components in Flink interpreter: Scala Shell & Python Shell

  • Scala shell is the entry point of Flink interpreter, it would create all the entry points of Flink program, such as ExecutionEnvironment,StreamExecutionEnvironment and TableEnvironment. Scala shell is responsible for compiling and running scala code and sql.

  • Python shell is the entry point of PyFlink, it is responsible for compile and run python code.

Community

Join Zeppelin community to discuss with others

Community
Logo