Skip to content

A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.

Notifications You must be signed in to change notification settings

jacobceles/intro-to-colab-pyspark-emr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 

Repository files navigation

Colab and PySpark

Everything PySpark.

Once you complete this notebook, you should be able to write pyspark programs in an efficent way. The ideal way to use this is by going through the examples given and then trying them on Colab. At the end there are a few hands on questions which you can use to evaluate yourself. The objective of the notebook is to:

  • Give a proper understanding about the different PySpark functions available.
  • A short introduction to Google Colab, as that is the platform on which this notebook is written on.

I have made an html version of the same, which you can easily access here.

About

A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published