Skip to content

Tiago-B-C-Reis/Apache_Spark

Repository files navigation

ApacheSpark_env.png

Apache Spark

Spark Streaming, Machine Learning, Spark DataFrames and more.

Used platforms:

  • Spark with pySpark (IDE: PyCharm | OS: Ubuntu)
  • Databricks
  • Jupyter Notebook
  • AWS EC2 PySpark
  • AWS EMR Cluster

Apache Spark & PySpark

Course topics and projects

  • Spark DataFrame Basics
  • Spark DataFrame Operations
  • GroupBy and Aggregate Functions
  • Missing Data
  • Dates and Timestamps
  • Spark DataFrame Exercises.

Acquired skills:

  • Utilizing Spark 2.0 DataFrame Syntax to analyze big data.
  • Engaging in consulting projects that simulate real-world scenarios.
  • Applying logistic regression to classify customer churn.
  • Utilizing Random Forests and Gradient Boosted Trees in Spark for classification.
  • Creating powerful machine learning models with Spark's MLlib.
  • Gaining familiarity with the DataBricks platform.
  • Setting up Amazon Web Services EC2 for big data analysis.
  • Utilizing AWS Elastic MapReduce Service.
  • Leveraging the power of Linux in a Spark environment.
  • Developing a spam filter using Spark and natural language processing.
  • Analyzing tweets in real-time with Spark Streaming.

About

Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published