Skip to content

The project researches sentiment analysis on Twitter, with the goal of evaluating the positivity, negativity or neutrality of comments. Using Word Embeddings, an advanced method in natural language processing, our model achieved a high accuracy of 96.61%. The model was trained on Twitter data and tested on a data comment dataset from Binance.

Notifications You must be signed in to change notification settings

FPT-ThaiTuan/Using-Word-Embeddings-for-Twitter-Sentiment-Analysis

Repository files navigation

Using Word Embeddings for Twitter Sentiment Analysis

Project implementation

1. Learn the basic concepts of Natural Language Processing

1.1. Learn about Word Embeddings (eg: Bag of Words (BOW), Word2Vec,..)

1.2. Refer to related research articles

1.3. Refer to related code articles

2. Deploy project construction

2.1. Collect data (Here I use kaggle)

- The link to the dataset is DATASET

- Data chart

屏幕截图 2024-03-27 063613 屏幕截图 2024-03-27 063550

2.2. Data preprocessing

- Select inputs for model target and features

- Delete emoticons, special characters, limit the number of words in a sentence

- Standardize train data, validation data

- Converted our label to a one-time encoded value for the label

2.3. Build the model

- Model architecture

屏幕截图 2024-03-27 063833

2.4. Visualize parameters and results

- Show loss and accuracy train and validation

屏幕截图 2024-03-27 064110 屏幕截图 2024-03-27 064013

2.5. Perform testing on external data sets

- Basic results and examples

屏幕截图 2024-03-27 064533

- Results with binance test set

屏幕截图 2024-03-27 064554

2.6. Visualize the vector and text above

- Vector data image

- Points: 9999

- Dimension: 64

屏幕截图 2024-03-27 065022 - You can try this link tensorflow

3. Conclusion

3.1. Advantages and disadvantages of the method used

  • Word embeddings are powerful representations of words in a continuous vector space, capturing semantic relationships and improving NLP tasks' performance. They offer advantages such as semantic representation, dimensionality reduction, and transfer learning. However, they have limitations like fixed vocabulary, contextual ambiguity, and data bias.

3.2. Find ways to improve the model in the future

  • Further development in terms of application
  • Train the model with larger data
  • Model improvements (combining other models, changing parameters,...)

Hope this article can help you.

If you have any questions please contact me for help!

Thanks everyone!

About

The project researches sentiment analysis on Twitter, with the goal of evaluating the positivity, negativity or neutrality of comments. Using Word Embeddings, an advanced method in natural language processing, our model achieved a high accuracy of 96.61%. The model was trained on Twitter data and tested on a data comment dataset from Binance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published