Skip to content

Using deep learning to predict the popularity of pictures from ski resorts

Notifications You must be signed in to change notification settings

Polhovsky/SkiNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SkiNet

"Using deep learning to predict the popularity of pictures from ski resorts"



If you'd like to read the paper, click here!

With the enormous popularity of social networks like LinkedIn, Facebook and Instagram, the online world plays a significant role in marketing campaigns. This study focuses on the promotion of ski resorts on Instagram. The official accounts of 80 US ski resorts have been analysed in order to predict the popularity of their pictures with the objective to optimize the use of their Instagram accounts in order to reach the most people.

A state-of-the-art Deep Convolutional Neural Network (DCNN) will be trained to classify the pictures and, together with additional describing features of both the resort and the pictures, will be used for the final prediction. In total over 75 thousand pictures have been used for transfer learning with the VGG architecture in order to optimize the predictions.

A baseline model, without any input from the pictures themselves, achieves an accuracy of 66% on a hold-out set. Adding newly engineered features from a DCNN increases the accuracy to 74% for exact predictions and 99% for predictions plus or minus one class.


Utah’s famous powder snow, a dream for many skiers but does such a picture get more likes? (© Snowbird, Utah)


The code consists of the following steps:

  1. Collecting the links to all the pictures including the additional data like number of followers, data and time of post, etc.
  2. Scraping all the data except the pictures
  3. Scraping all the pictures
  4. Collecting latitude and longitude for every ski resort in this study based on addresses by using the Google API
  5. Creating resort specific features
  6. Creating picture specific features
  7. Exploring and visualizing the data
  8. Constructing the baseline model
  9. Cropping and resizing the pictures for later stages of the analysis
  10. Cluster analysis
  11. Selecting a random sample of pictures for the image classification
  12. Splitting the manually labeled images into a training -, validation - and test set
  13. Image classification based on Principal Components Analysis
  14. Image classification using deep learning
  15. Evaluation of the image classifier using deep learning
  16. Constructing the final prediction model

The code is written in Python 3.6.1


Disclaimer

The code on this repository only shows how to scrape data from Instagram and does in no way encourage people to scrape data. In case of SkiNet, data has been used only for non-commercial purposes. Using any provided code on this repository is entirely at your own risk!