Skip to content

DonaldWolfson/Reddit-Prediction-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reddit Upvote Prediction Model

A linear regression model aimed at finding features that best influence predicting a Reddit post's upvotes.

Description

This repo stores a research project for investigating the influence of different features in a linear regression model that aims to predict a Reddit posts upvotes. The finalized paper can be found in pdf/Research_Paper.pdf.

To start, this project web mined the top 500 subreddits based on subsriber count, and then attempts to get there top 500 posts from the last 365 days. The top 500 subreddits can be found here, and the 246,472 posts can be found here. NOTE: This data include both SFW and NSFW content, you have been warned.

The finalized prediction model script can be found here. Numerous scripts were used to optimize, and analyze the script with ablation. This can be found in the scripts folder. Any images produced by the scripts are stored in the images folder.

This project was a class assignment for Fall 2021, CSE 158, and the assignment description can be found here. The approachs and model was inspired by a research paper that can be found here, and the dataset this papers utilizes can be found here.

Getting Started

Dependencies

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • frontpagemetrics.com
    • Data Used: 2021-11-19.csv
  • CSE 158 Datasets
    • Understanding the interplay between titles, content, and communities in social media Himabindu Lakkaraju, Julian McAuley, Jure Leskovec ICWSM, 2013
    • Data Used: submissions.csv.gz

About

Research project aimed at developing a prediction model to estimate the number of upvotes of a given Reddit post.

Topics

Resources

License

Stars

Watchers

Forks