PySpark

Steps:

  • bin/pyspark call org.apache.spark.launcher.Main which would build the command to launch python. And also set the shell.py as the startup script of pyspark to start the spark.

  • context.py will call java_gateway.py to launch spark by invoking bin/spark-submit.

  • spark-submit will launch JavaGatewayServer

  • Python side create SparkConf and SparkContext through the gateway

bin/pyspark.sh

export PYTHONSTARTUP="${SPARK_HOME}/python/pyspark/shell.py"

Last updated

Was this helpful?