Disaster Tweets Analysis and Classification

Project Description

Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time.
Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

Total approach towards the project can be seen on kaggle
- Machine Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-ml
- Deep Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-deep-learning

Project Contents

Exploratory Data Analysis
EDA after Data Cleaning
Data Preprocessing using NLP
Machine Learning models for classifying Tweets data
Deep Learning approach for classifying Tweets data
Model Deployment

Resources Used

Packages : Pandas, Numpy, Matplotlib, Plotly, Word-cloud, Tensorflow, Scikit-Learn, Keras, Keras-tuner, Nltk etc.
Dataset : https://www.kaggle.com/c/nlp-getting-started
Word Embeddings : https://www.kaggle.com/danielwillgeorge/glove6b100dtxt

1. Exploratory Data Analysis

Visualising Target Variable of the Dataset
Visualising Length of Tweets
Visualising Average word lengths of Tweets
Visualising most common stop words in the text data
Visualising most common punctuations in the text data

2. EDA after Data Cleaning

We use Python Regex library and nltk lemmatizing methods for Data Cleaning
Visualising words inside Real Disaster Tweets
Visualising words inside Fake Disaster Tweets
Visualising top 10 N-grams where N is 1,2,3

3. Data Preprocessing using NLP

Data Preprocessing for ML models is done using two approaches
- Bag of Words using CountVectorizer
- Term Frequency and Inverse Document Frequency using TfidfVectorizer
Data Preprocessing for DL models using Tokenization

4. Machine Learning models for classifying Tweets data

Machine Learning Models using different n-grams and both Bow and Tf-Idf and visualisation comparing there accuracy
The Best ML model trained as we can see above is Voting Classifer, whose report and confusion matrix is shown below

5. Deep Learning approach for classifying Tweets data

Using Glove Word Embeddings of embedding dimension = 100 to get embedding matrix for our DL models
For every DL model we create a function and use Keras-Tuner to tune the model
Finally we choose Bidirectional LSTM for the Deployment

6. Model Deployment

Bidirectinal LSTM model obtained from Deep Learning approach is used for deployment
Micro Web Framework Flask is used to create web app
Heroku is used to deploy the our Web-app on https://disastertweetsdl.herokuapp.com/
Deep Learning Web app working

Scope of Improvemment

We can always use large dataset which covers almost every type of data for both machine learning and deep learning
We can use the best pretrained models but they require a lot of computational power
Also there are various ways to increase model accuracy like k-fold cross validation, different data preprocessing techniques better than used here

Conclusion

The Data analysis and modelling was sucessfully done, and the Deep Learning model was deployed on Heroku

Please do ⭐ the repository, if it helped you in anyway.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
Readme_requirements		Readme_requirements
static		static
templates		templates
.slugignore		.slugignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
disaster-tweets-classification-using-deep-learning.ipynb		disaster-tweets-classification-using-deep-learning.ipynb
disaster-tweets-classification-using-ml.ipynb		disaster-tweets-classification-using-ml.ipynb
model_BiLSTM.h5		model_BiLSTM.h5
nltk.txt		nltk.txt
requirements.txt		requirements.txt
tokenizer.pickle		tokenizer.pickle

License

raklugrin01/Disaster-Tweets-Analysis-and-Classification

Folders and files

Latest commit

History

Repository files navigation

Disaster Tweets Analysis and Classification

Project Description

Project Contents

Resources Used

1. Exploratory Data Analysis

2. EDA after Data Cleaning

3. Data Preprocessing using NLP

4. Machine Learning models for classifying Tweets data

5. Deep Learning approach for classifying Tweets data

6. Model Deployment

Scope of Improvemment

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Languages