Skip to content

Sentiment analysis of user opinions regarding the Apple M1 chip using Python and NLTK.

Notifications You must be signed in to change notification settings

christineegan42/apple-m1-sentiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Social Media Sentiment Analysis Engine

Navigate this Repository

apple-m1-sentiments
│ README.md
│ M1Presentation.pdf
└─notebooks
│   2020_02_02_CE_VectorizeData.ipynb
│   2021_01_29_CE_EDA.ipynb
│   2021_02_01_CE_TestNB_PreprocessData.ipynb
│   2021_02_01_Models_NB.ipynb2021_02_01_Models_NB.ipynb
│   2021_02_01_Models_SVC.ipynb
└─data
│   2020_02_05_CombinedData.csv
│   2021-02-05_13_04_16.csv
└─src
│   app.py
│   eda_visualizations.py
│   general_functions.py
│   model_functions.py
│   process_data.py
│   reddit_api.py
│   twitter_api.py
    

What do users think about the Apple M1 chip?

In December 2020, Apple launched three products, Mac Mini, MacBook Air, and MacBook Pro featuring the M1 chip. This was a departure from previous iterations of these products which used Intel chips. Apple claimed that the new chip would offer improved performance and efficiency at a better price point.

In order to determine if users felt that the M1 chip was living up to Apple’s claims, I designed a sentiment analysis engine to extract data from Tweets and Reddit posts/comments and analyze the user sentiments. Then, I built a model to predict if a given text blurb from a user was positive, negative, or neutral so that I could make generalizations about the user experience in each category.

Methodology

* Build an application to collect data.
* Preprocess: Clean text, remove stops and lemmatize.
* Extract features: Text length, POS tags, subjectivity, and compound polarity.
* Label data as positive, negative, or neutral based on compound polarity.
* Create a model that will predict if an observation is positive, negative, or neutral.
* Make generalizations regarding each category.

Data Pipeline

In order to collect the data, I ran my script at various times each day, and saved each data pull from the API with a time stamp, then periodically pulled batches of the raw data.

DataPipeline

Exploratory Data Analysis

Positive Reception

DataClasses

Positive Class (1.0): 6322 Neutral Class (0.0): 5696 Negative Class (-1.0): 1628

It is a little suspicious that the neutral class is so large. However, out of all three classes, the neutral class had the most instances of foreign words. It could be that the foreign words are causing a mislabeling. This could mean that the class imbalance may be more or less pronounced, depending on the sentiment of the foriegn words.

Negative Opinions

Among the negative comments we observe frequent instances of the words "air", "pro", "iphone", and "ipad". This suggests that there is some link between these products and user dissatisfaction. Since one of Apple's main claims was that the M1 would foster compatibility among other products in the Apple Universe, including allowing iPhone and iPad apps to run natively on the machine, it would be worth further investigation to determine if users think that the M1 is living up to these claims.
NegativeWords

Models

1. Niave Bayes

I used GaussianNB from sklearn to to make predictions about polarity. Since there was a class imbalance, I ran the model with and without SMOTE, and with cross validation and chose from the best scores.
NB

NB

2. Support Vector Classifier

I also used sklearn.SVM SVC to see if I could improve my results. I fit the model with hyper parameters using GridSearchCV. The performance of the model was impressive, however I would like to further investigate if overfitting is occuring.
SVC

Recommendations

  • Collect more data regarding the customer's perception of compatibility among other products in the Apple universe. Work towards increased compatibility in later updates and releases.
  • Develop methods to deal with foriegn words when collecting social media data.
  • Capitialize on opportunities to highlight performance by reaching out to users with positive experiences and asking them about specific performance benchmarks. Expand efforts to learn more about user perception of performance.

Future Work

  • Expand application to access data from additional platforms.
  • Take advantage of time data to provide insights into user opinions over time.
  • Stream data to a dashboard to analyze and update changing opinions in realtime.