Skip to content

Objectives: Using pyspark, MLlib and graphframes libraries, perform 1) classification and custering tasks using RandomF and Kmeans and 2) graph analysis tasks. This material is from UIUC MCS coursework.

steve303/spark_MLlib_graphf

Repository files navigation

Spark - MLlib, graphframes

ML vs MLLib

  • Parts B and D (MLLib exercises) can be solved using either the Dataframe-based API (pyspark.ml) or the RDD-based API (pyspark.mllib). The corresponding templates for each have the suffix _ml and _mllib. Make sure you rename the python files corresopnding to parts B and D to part_b.py and part_d.py respectively before submitting them.

Execution instructions

  • Each file can be executed by running spark-submit --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 part_xxx.py
  • You can alternatively run the following to get rid of spark logs spark-submit --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 part_xxx.py 2> /dev/null
  • Make sure that you have the given dataset in the directory you are running the given code from. The structure this repository is arranged in is recommended.
  • While the extra argument for graphframes is not required for part b and part d, it is not necessary to remove it these parts

About

Objectives: Using pyspark, MLlib and graphframes libraries, perform 1) classification and custering tasks using RandomF and Kmeans and 2) graph analysis tasks. This material is from UIUC MCS coursework.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published