Final Report

https://github.com/danlee0528/Two-Sigma-Connect-Rental-Listing-Inquiries/blob/master/Project%20Final%20Report.pdf

Dataset

https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/overview

Dataset Description

Predict how popular an apartment rental listing is based on the listing content like text description, photos, number of bedrooms, price, etc. The data comes from renthop.com, an apartment listing website. These apartments are located in New York City.

The target variable, interest_level, is defined by the number of inquiries a listing has in the duration that the listing was live on the site.

train.json - the training set
test.json - the test set
sample_submission.csv - a sample submission file in the correct format
images_sample.zip - listing images organized by listing_id (a sample of 100 listings)
Kaggle-renthop.7z - (optional) listing images organized by listing_id. Total size: 78.5GB compressed. Distributed by BitTorrent (Kaggle-renthop.torrent).

Data fields

bathrooms: number of bathrooms
bedrooms: number of bathrooms
building_id
created
description
display_address
features: a list of features about this apartment
latitude
listing_id
longitude
manager_id
photos: a list of photo links. You are welcome to download the pictures yourselves from renthop's site, but they are the same as imgs.zip.
price: in USD
street_address
interest_level: this is the target variable. It has 3 categories: 'high', 'medium', 'low'

Project Description

Phase1. Exploratory data analysis and data pre-processing

Perform the initial analysis and exploration on the dataset to summarize its main characteristics. This step is a great practice to see what the data can tell you beyond the formal modelling or hypothesis testing task, like discovering the potential patterns, spotting outliers and so on. To this aim, apply any meaningful visualization methods and statistical tests.
In addition, in this phase, perform data pre-processing, which is the practice of detecting and correcting corrupt or inaccurate records from the dataset, by identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
Finally, extract features from the unstructured text and images associated with the dataset, optionally use traditional feature extraction methods. Neural network-based methods that have recently become very popular for processing natural language and images are not allowed in this project.

Phase 2. Training Models

Train on the data that has been preprocessed in milestone 1. Choose among the following three classifiers:

Decision Tree
Logistic Regression
SVM

Phase 3. Advanced Models

Develop more advanced classifiers of own choice. The only restriction is that only the classifiers used in milestone 2 should be engineered.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
obj_detection_imgs		obj_detection_imgs
Milestone2. Logistic Regression.ipynb		Milestone2. Logistic Regression.ipynb
Milestone2. SVM.ipynb		Milestone2. SVM.ipynb
Milestone3. Bernoulli Naive Bayes.ipynb		Milestone3. Bernoulli Naive Bayes.ipynb
Milestone3. KNN.ipynb		Milestone3. KNN.ipynb
Milestone3. Random Forests.ipynb		Milestone3. Random Forests.ipynb
Part 1. The Proportion of Target Variable Values .ipynb		Part 1. The Proportion of Target Variable Values .ipynb
Part 1. Histograms for Price, Latitude and Longitude.ipynb		Part 1. Histograms for Price, Latitude and Longitude.ipynb
Part 1. Hourly Trending and Top Five Busiest Hours of Postings.ipynb		Part 1. Hourly Trending and Top Five Busiest Hours of Postings.ipynb
Part 2. Dropping and dealing with missing values.md		Part 2. Dropping and dealing with missing values.md
Part 2. Find number of missing values in each variable.ipynb		Part 2. Find number of missing values in each variable.ipynb
Part 2. Outliers Visualization and Analysis.ipynb		Part 2. Outliers Visualization and Analysis.ipynb
Part 3. Extract features from images.md		Part 3. Extract features from images.md
Part3.Text_Feature_Extract.ipynb		Part3.Text_Feature_Extract.ipynb
Project Final Report.pdf		Project Final Report.pdf
README.md		README.md
cvObjParser.py		cvObjParser.py
decision_tree.ipynb		decision_tree.ipynb

dleedev365/Two-Sigma-Connect-Rental-Listing-Inquiries

Folders and files

Latest commit

History

Repository files navigation

Final Report

Dataset

Dataset Description

Data fields

Project Description

Phase1. Exploratory data analysis and data pre-processing

Phase 2. Training Models

Phase 3. Advanced Models

About

Topics

Resources

Stars

Watchers

Forks

Languages