Skip to content

Scrape & Map The New Yorker Tables for Two Reviews. Tools: SQLite, Pandas, Beautiful Soup, Google API, NLP (sentiment analysis)

Notifications You must be signed in to change notification settings

tejeffers/T4T-Table-Scraper

Repository files navigation

T4T-Table-Scraper

Scrape & Map The New Yorker Tables for Two reviews

[View map] (http://htmlpreview.github.io/?https://github.com/tejeffers/T4T-Table-Scraper/blob/master/T4T_google-maps_102816.html)

Here, I’ve used python’s open source beautifulsoup, geopy location services, and Google’s ‘requests’ API to scrape The New Yorker’s Tables for Two restaurant reviews, dating all the way back to 1936! Some have closed, some have moved,

Here’s how it works:

  1. Scrape each article from TNY’s Tables for Two history
  2. For each review, save some info in a SQLite database for later:
  3. Restaurant name
  4. Address
  5. Telephone number
  6. Article Date
  7. Text of the review
  8. Grab the latitude and longitude of the restaurant, either using:
  9. Python geopy
  10. Google’s ‘requests’ API (so far, neither is perfect…)
  11. Format [Restaurant Name, Lat, Lng] for loading into Google Maps javascript.

Relevant notebooks:

T4T_Map

To Do:

  1. Sentiment analysis. Although rare, sometimes the reviews aren’t very good. Can I assign a rating system based on the text of the review?
  2. From text, assign tags — (tacos, noodles, sushi, etc)
  3. Time series analysis: how has the distribution of restaurants changed over the past 80 years?
  4. Create a distance-based map: given current location, which amazing restaurant is closest?
  5. Create a random-restaurant-generator… to resolve weeknight dinner ambivalence.
  6. Repeat with the BarTabs page! A younger column, but a very valuable resource nevertheless!

About

Scrape & Map The New Yorker Tables for Two Reviews. Tools: SQLite, Pandas, Beautiful Soup, Google API, NLP (sentiment analysis)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published