Spark Usage Instructions
Spark Basic Usage
cd $SPARK_HOME
./bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster \
--master yarn lib/spark-examples-1.4.1-hadoop2.4.0.jar 10
pyspark
When using pyspark for submitting tasks, the worker may be unable to find the python lib in the spark framework. The following settings may solve the problem:
conf.set('spark.yarn.dist.files','file://$SPARK_HOME/python/lib/pyspark.zip,file://$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip')
conf.setExecutorEnv('PYTHONPATH','pyspark.zip:py4j-0.8.2.1-src.zip')
> Notice: the real package name might be different from the above example, remember to replace the above package name with the real package name
View History
You can view history through the spark history Server UI and easily locate issues. Typically, after you have set up a socks5 ssh tunnel, you will be redirected to the Spark UI after clicking on the job link, but in some cases this might fail. In this case, you can try to access http://${history_ip}:${history_port}
and open the appId to find the corresponding history UI.
Description | Port |
---|---|
history UI | 18900 |
NOTE: To obtain the above ${history_ip}, refer to set up socks5 agent Step 4.