Social Media Sentiment Analysis Engine

Navigate this Repository

apple-m1-sentiments
│ README.md
│ M1Presentation.pdf
└─notebooks
│   2020_02_02_CE_VectorizeData.ipynb
│   2021_01_29_CE_EDA.ipynb
│   2021_02_01_CE_TestNB_PreprocessData.ipynb
│   2021_02_01_Models_NB.ipynb2021_02_01_Models_NB.ipynb
│   2021_02_01_Models_SVC.ipynb
└─data
│   2020_02_05_CombinedData.csv
│   2021-02-05_13_04_16.csv
└─src
│   app.py
│   eda_visualizations.py
│   general_functions.py
│   model_functions.py
│   process_data.py
│   reddit_api.py
│   twitter_api.py

What do users think about the Apple M1 chip?

In December 2020, Apple launched three products, Mac Mini, MacBook Air, and MacBook Pro featuring the M1 chip. This was a departure from previous iterations of these products which used Intel chips. Apple claimed that the new chip would offer improved performance and efficiency at a better price point.

In order to determine if users felt that the M1 chip was living up to Apple’s claims, I designed a sentiment analysis engine to extract data from Tweets and Reddit posts/comments and analyze the user sentiments. Then, I built a model to predict if a given text blurb from a user was positive, negative, or neutral so that I could make generalizations about the user experience in each category.

Methodology

* Build an application to collect data.
* Preprocess: Clean text, remove stops and lemmatize.
* Extract features: Text length, POS tags, subjectivity, and compound polarity.
* Label data as positive, negative, or neutral based on compound polarity.
* Create a model that will predict if an observation is positive, negative, or neutral.
* Make generalizations regarding each category.

Data Pipeline

In order to collect the data, I ran my script at various times each day, and saved each data pull from the API with a time stamp, then periodically pulled batches of the raw data.

Exploratory Data Analysis

Positive Reception

Positive Class (1.0): 6322 Neutral Class (0.0): 5696 Negative Class (-1.0): 1628

It is a little suspicious that the neutral class is so large. However, out of all three classes, the neutral class had the most instances of foreign words. It could be that the foreign words are causing a mislabeling. This could mean that the class imbalance may be more or less pronounced, depending on the sentiment of the foriegn words.

Negative Opinions

Among the negative comments we observe frequent instances of the words "air", "pro", "iphone", and "ipad". This suggests that there is some link between these products and user dissatisfaction. Since one of Apple's main claims was that the M1 would foster compatibility among other products in the Apple Universe, including allowing iPhone and iPad apps to run natively on the machine, it would be worth further investigation to determine if users think that the M1 is living up to these claims.

Models

1. Niave Bayes

I used GaussianNB from sklearn to to make predictions about polarity. Since there was a class imbalance, I ran the model with and without SMOTE, and with cross validation and chose from the best scores.

2. Support Vector Classifier

I also used sklearn.SVM SVC to see if I could improve my results. I fit the model with hyper parameters using GridSearchCV. The performance of the model was impressive, however I would like to further investigate if overfitting is occuring.

Recommendations

Collect more data regarding the customer's perception of compatibility among other products in the Apple universe. Work towards increased compatibility in later updates and releases.
Develop methods to deal with foriegn words when collecting social media data.
Capitialize on opportunities to highlight performance by reaching out to users with positive experiences and asking them about specific performance benchmarks. Expand efforts to learn more about user perception of performance.

Future Work

Expand application to access data from additional platforms.
Take advantage of time data to provide insights into user opinions over time.
Stream data to a dashboard to analyze and update changing opinions in realtime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

src

src

M1Presentation.pdf

M1Presentation.pdf

README.md

README.md

Repository files navigation

Social Media Sentiment Analysis Engine

Navigate this Repository

What do users think about the Apple M1 chip?

Methodology

Data Pipeline

Exploratory Data Analysis

Positive Reception

Negative Opinions

Models

1. Niave Bayes

2. Support Vector Classifier

Recommendations

Future Work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
notebooks		notebooks
src		src
M1Presentation.pdf		M1Presentation.pdf
README.md		README.md

christineegan42/apple-m1-sentiments

Folders and files

Latest commit

History

Repository files navigation

Social Media Sentiment Analysis Engine

Navigate this Repository

What do users think about the Apple M1 chip?

Methodology

Data Pipeline

Exploratory Data Analysis

Positive Reception

Negative Opinions

Models

1. Niave Bayes

2. Support Vector Classifier

Recommendations

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Languages