Third party dependencies

Introduction

It is very common to have third party dependencies when you write flink job in whatever languages (scala, python, sql). It is very easy to add dependencies in IDE (e.g. add dependency in pom.xml), but how can you do that in Zeppelin ? Mainly there're 2 settings you can use to add third party dependencies

  • flink.execution.packages

  • flink.execution.jars

flink.execution.packages

This is the recommended way of adding dependencies. Because its implementation is the same as adding dependencies in pom.xml. Underneath it would download all the packages and its transitive dependencies from maven repository, then put them on the classpath. Here's one example of how to add kafka connector.

flink.execution.packages  org.apache.flink:flink-connector-kafka_2.11:1.10.0,org.apache.flink:flink-connector-kafka-base_2.11:1.10.0,org.apache.flink:flink-json:1.10.0

The format is artifactGroup:artifactId:version, if you have multiple packages, then separate them with comma. flink.execution.packages requires internet accessible, if you can not access internet, then you need to use flink.execution.jars

flink.execution.jars

If your Zeppelin machine can not access internet or your dependencies are not deployed to maven repository, then you can use flink.execution.jars to specify the jar files you depend on (each jar file is separated with dot)

Here's one example of how to add kafka dependencies via flink.execution.jars

flink.execution.jars /Users/jzhang/github/flink-kafka/target/flink-kafka-1.0-SNAPSHOT.jar

I build this jar in this project, you can use similar approach to build other connector jars. You can also put your jars on hdfs, and specify the path by prefix hdfs. e.g.

flink.execution.jars  hdfs://localhost:9090/tmp/flink-kafka-1.0-SNAPSHOT.jar

Video Tutorial

Community

Join Zeppelin community to discuss with others

Last updated