Skip to content

Analysis of a dataset of flights, using the SparkSQL framework and extra web scraping techniques

License

Notifications You must be signed in to change notification settings

gavalle94/Flights-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Flights-dataset

Analysis of a dataset of flights, using the SparkSQL framework and extra web scraping techniques

Project overview

What is the project for

The goal of this demo is to massively use the SparkSQL framework functionalities to perform a dataset analysis. In this example, the dataset represents the home US flights for the year 1994.

In the final section of the project, in order to understand if flight delays were related to wheater conditions, web scraping techniques have been applied. In this way, we have enough data to perform the analysis.

How to access the project files

You can find the Python Notebook exported as an HTML file, that is more portable in terms of readability.

Technical details

Apache Spark

As said before, we used SparkSQL to query the database and analyse its content.

Data Science Python libraries

We have decided to use some of the most famous available Python libraries:

  • Pandas and Numpy: data processing and analysis
  • Matplotlib and Seaborn: data visualization
  • UrlLib and BS4: web scraping for the weather conditions

Special thanks

We want to thank you our teacher, Michiardi Pietro, who has realized the baseline for the notebook and has guided us during its realization, teaching us all the techniques presented here.

Credits

ANGIUS Marco and AVALLE Giorgio - Ⓒ2017

About

Analysis of a dataset of flights, using the SparkSQL framework and extra web scraping techniques

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages