Spark Streaming, Machine Learning, Spark DataFrames and more.
- Spark with pySpark (IDE: PyCharm | OS: Ubuntu)
- Databricks
- Jupyter Notebook
- AWS EC2 PySpark
- AWS EMR Cluster
- Spark DataFrame Basics
- Spark DataFrame Operations
- GroupBy and Aggregate Functions
- Missing Data
- Dates and Timestamps
- Spark DataFrame Exercises.
- Linear Regression.
- Logistic Regression.
- Tree Methods.
- Clustering.
- Recommender System.
- Natural Language Processing.
- Projects:
- Utilizing Spark 2.0 DataFrame Syntax to analyze big data.
- Engaging in consulting projects that simulate real-world scenarios.
- Applying logistic regression to classify customer churn.
- Utilizing Random Forests and Gradient Boosted Trees in Spark for classification.
- Creating powerful machine learning models with Spark's MLlib.
- Gaining familiarity with the DataBricks platform.
- Setting up Amazon Web Services EC2 for big data analysis.
- Utilizing AWS Elastic MapReduce Service.
- Leveraging the power of Linux in a Spark environment.
- Developing a spam filter using Spark and natural language processing.
- Analyzing tweets in real-time with Spark Streaming.