TripAdvisor Review Analyzer

Tripadvisor Review Analyzer App using Python and selenium to scrape and extract the latest reviews from an attraction on the Tripadvisor URL link the user enters on the landing page, scraped review data then cleaned, processed and analyzed with Natural Language Processing toolkit NLTK and Sentiment Analysis is performed on the contents of the reviews

About the Project

Tripadvisor Review Analyzer App for tourist attractions using Python and selenium to scrape and extract the latest reviews from an attraction on the Tripadvisor URL link the user enters on the app landing page, then the scraped review data are cleaned,processed and analyzed with Natural Language Processing toolkit NLTK and Sentiment Analysis is performed on the contents of the reviews.

First of all, when the URL link of an attraction on Tripadvisor is entered by the user,selenium will scrape the data for the latest 100 reviews written for the attraction on Tripadvisor page *(less than 100 reviews will be analyzed if the attraction is fairly new or unknown and has less than 100 reviews written on its Tripadvisor page) then, using the Natural Language Toolkit python package NLTK and its built-in Vader Sentiment Analyzer, classify the reviews written for the attraction as positive, negative or neutral using a lexicon of positive and negative words.

Once the reviews are classified, data processing is performed on positive and negative reviews data respectively, Tokenization to break down the review sentences into meaningful elements as tokens, lowercase texts and remove puctuations then remove the words such as "the", "is", "what" and so on from the tokenized data that are irrelevant to text sentiment and dont provide any valuable information which are stopwords

The next step is, again with NLTK, get the most common words found in both positive and negative review groups and the following data is available and displayed as analyzed results on the results page:

number of reviews classified as positive
number of reviews classified as negative
few samples of reviews classified as positive
few samples of reviews classified as negative
Most frequently used words and its frequency found in POSITIVE reviews
Most frequently used words and its frequency found in NEGATIVE reviews

This is how the results page looks like:

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
__pycache__		__pycache__
images		images
static/css		static/css
templates		templates
webapp		webapp
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.yaml		app.yaml
appengine_config.py		appengine_config.py
forms.py		forms.py
main.py		main.py
nltk.txt		nltk.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

images

images

static/css

static/css

templates

templates

webapp

webapp

.gcloudignore

.gcloudignore

.gitignore

.gitignore

Procfile

Procfile

README.md

README.md

app.yaml

app.yaml

appengine_config.py

appengine_config.py

forms.py

forms.py

main.py

main.py

nltk.txt

nltk.txt

requirements.txt

requirements.txt

Repository files navigation

TripAdvisor Review Analyzer

About the Project

About

Releases

Packages

Languages

morikaglobal/ta_review_analyzer

Folders and files

Latest commit

History

Repository files navigation

TripAdvisor Review Analyzer

About the Project

About

Topics

Resources

Stars

Watchers

Forks

Languages