Skip to content

Example of a Spark job project with a local but distributed cluster based on docker [ generated via giter8 template @ ]

Notifications You must be signed in to change notification settings

s3ni0r/spark-app-example-with-local-hadoop-cluster

Repository files navigation

spark-app-example

Main Goal

Create a local but iso to production environnement to be as autonomous as possible while working on spark projects.

How ?

this project contain all needed configuration files to create :

  • Dockerized environnement
  • Local but a real distributed environnement
    • 1 Namenode
    • 1 Datanode (to increase as you wish)
    • Yarn resource manager
    • 3 Yarn node managers
    • Yarn hitory server
    • Spark history
    • Spark shell
  • Line up with exact Hadoop components version on production
  • Deployment to dockerized cluster via sbt command line
  • Mount data to hdfs via docker volumes from withing project folder
  • Access spark history webui for inspection :)
  • Access Yarn logs for debugging :)
  • Access to Spark shell for fiddling :)

prerequisite

add these localhost aliases to /etc/hosts

echo "127.0.0.1       namenode datanode resourcemanager nodemanager nodemanager-1 nodemanager-2 nodemanager-3 historyserver spark-master spark-worker spark-history" >> /etc/hosts

how to run

# start up the cluster if already has been built
docker-compose up -d

cluster docker containers

Load data into hdfs

# Load dev data placed in the data directory into hdfs
docker exec -it namenode bash /scripts/hdfs-loader.sh

Hdfs data load

Run spark job in the cluster via sbt

sbt
;clean;reload;compile;docker;dockerComposeUp

Sbt run

Run Spark shell connected to yarn cluster

docker exec -it spark-shell /spark/bin/spark-shell

Spark shell

Check Yarn history

chrome|firefox http://localhost:8188

Yarn hisotry

Check Spark history

chrome|firefox http://localhost:18080

Spark history

Check hadoop hdfs namenode

chrome|firefox http://localhost:9870

hdfs-ui

stop, remove, clean volumes and network of all cluster

docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q) && docker volume prune -f && docker network prune -f

About

Example of a Spark job project with a local but distributed cluster based on docker [ generated via giter8 template @ ]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published