1. Download and Install JVM
sudo apt install openjdk-17-jdk java -version
export JAVA_HOME=/usr/bin/jvm/java-17-openjdk-amd64 export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=$CLASSPATH:$JAVA_HOME/bin/ext:JAVA_HOME/lib/tools.jar
2. Download and Install Spark
Download Spark:
https://spark.apache.org/downloads.html
1. Choose a Spark release
2. Choose a package type
3. Click the download link
-> e.g., https://www.apache.org/dyn/closer.lua/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz
4. You can see the HTTP link for downloading spark
We suggest the following location for your download: https://dlcdn.apache.org/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz Alternate download locations are suggested below. It is essential that you verify the integrity of the downloaded file using the PGP signature (
HTTPhttps://dlcdn.apache.org/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz … |
5. Copy the link
6. Get the file using wget
-> e.g., wget https://dlcdn.apache.org/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz
7. Un-zip the file using tar xvf spark-3.x.x-bin-hadoop3.tgz
8. Move the files into /opt/spark/
mv spark-3.x.x-bin-hadoop3/ /opt/spark
9. Add path to .bashrc
export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
2. Execute Spark
Execute Spark with Script
- pyspark: for Python
- sparkR: for R
- spark-shell: for Scala
- spark-sql: for SparkSQL
Submit Spark Application
spark-submit
명령을 통해 JAR, Python Script, or R script 파일을 Spark로 보내고 실행할 수 있음.
3. Run a job on Spark
WebUI
Spark는 WebUI를 기본 제공한다. Standalone 으로 실행하는 경우
localhost:4040
로 접속하여 확인 가능하다.
Script로 Spark를 실행하는 경우, 콘솔에서 Welcome to Spark (logo) 이후
Spark context Web UI available at http://xxxxx:4040
과 같은 문구를 찾을 수 있다.
여러 application을 사용하는 경우 4040 포트 부터 포트 숫자가 증가한다고 한다.