SQL

In Zeppelin, there're 2 kinds of flink sql interpreter you can use

  • %flink.ssql

    • Streaming sql interpreter which launch flink streaming job via StreamTableEnvironment

  • %flink.bsql

    • Batch sql interpreter which launch flink batch job via BatchTableEnvironment

You can write all the supported flink sql statements in Zeppelin. Type help can display all the supported sql syntax.

Flink sql interpreter in Zeppelin is equal to sql-client + many other enhanced and useful features.

1. Support batch SQL and streaming sql simultaneously.

In sql-client, either you run streaming sql or run batch sql in one session. You can not run them together. But in Zeppelin, you can do that. %flink.ssql is used for running streaming sql, while %flink.bsql is used for running batch sql. And the batch/streaming flink jobs run in the same flink session cluster.

2. Multiple statement support

You can write multiple sql statements in one paragraph, each sql statement is separated by semicolon.

3. Comment support

2 kinds of sql comments are supported in Zeppelin:

  • Single line comment start with --

  • Multiple line comment around with /* */

4. Job parallelism setting

You can set the sql parallelism via paragraph local property: parallelism

5. Multiple Insert Support

Sometimes you have multiple insert statements which read the same source, but write to different sinks. By default each insert statement would launch a separated flink job, but you can set paragraph local property: runAsOne to be true to run them in a single flink job.

6. Set JobName

You can set flink job name for insert statement via setting property: jobName. To be noticed, you can only set job name for insert statement, select statement is not supported yet. And this kind of setting only works for single insert statement. It doesn't work for multiple insert we talked above.

Streaming SQL Visualization

Zeppelin can visualize the select sql result of flink streaming job. Overall it supports 3 modes:

  • Single mode

  • Update mode

  • Append mode

Single mode

Single mode is for the case when the result of sql statement is always one row, such as the following example. The output format is HTML, and you can specify paragraph local property template for the final output content template. And you can use {i} as placeholder for the ith column of result.

%flink.ssql(type=single, parallelism=1, refreshInterval=3000, template=<h1>{1}</h1> until <h2>{0}</h2>)

select max(event_ts), count(1) from sink_kafka

Update mode

Update mode is suitable for the case when the output is more than one rows, and always will be updated continuously. Here’s one example where we use group by.

%flink.ssql(type=update, refreshInterval=2000, parallelism=1)

select status, count(1) as pv from sink_kafka group by status

Append Mode

Append mode is suitable for the scenario where output data is always appended. E.g. the following example which use tumble window.

%flink.ssql(type=append, parallelism=1, refreshInterval=2000, threshold=60000)

select TUMBLE_START(event_ts, INTERVAL '5' SECOND) as start_time, status, count(1) from sink_kafka
group by TUMBLE(event_ts, INTERVAL '5' SECOND), status

Video Tutorial

Single mode

Update mode

Append mode

Community

Join Zeppelin community to discuss with others

Last updated