Skip to content

Analysing Disaster related tweets dataset and build a classifier using deep learning and deploy it using Heroku

License

Notifications You must be signed in to change notification settings

raklugrin01/Disaster-Tweets-Analysis-and-Classification

Repository files navigation

Disaster Tweets Analysis and Classification

Dataset Language Library ML Library DL Library

Project Description

Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time.
Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

Project Contents

  1. Exploratory Data Analysis
  2. EDA after Data Cleaning
  3. Data Preprocessing using NLP
  4. Machine Learning models for classifying Tweets data
  5. Deep Learning approach for classifying Tweets data
  6. Model Deployment

Resources Used

1. Exploratory Data Analysis

  • Visualising Target Variable of the Dataset

    Target Variable
  • Visualising Length of Tweets

    Tweet Length
  • Visualising Average word lengths of Tweets

    Avg Word Lengths
  • Visualising most common stop words in the text data

    Stopwords
  • Visualising most common punctuations in the text data

    Punctuations

2. EDA after Data Cleaning

  • We use Python Regex library and nltk lemmatizing methods for Data Cleaning

  • Visualising words inside Real Disaster Tweets

    Real WC
  • Visualising words inside Fake Disaster Tweets

    Fake WC
  • Visualising top 10 N-grams where N is 1,2,3

    Top N-grams

3. Data Preprocessing using NLP

  • Data Preprocessing for ML models is done using two approaches

    • Bag of Words using CountVectorizer
    • Term Frequency and Inverse Document Frequency using TfidfVectorizer
  • Data Preprocessing for DL models using Tokenization

4. Machine Learning models for classifying Tweets data

  • Machine Learning Models using different n-grams and both Bow and Tf-Idf and visualisation comparing there accuracy

    Ml List
  • The Best ML model trained as we can see above is Voting Classifer, whose report and confusion matrix is shown below

    Voting Classifier

5. Deep Learning approach for classifying Tweets data

  • Using Glove Word Embeddings of embedding dimension = 100 to get embedding matrix for our DL models
  • For every DL model we create a function and use Keras-Tuner to tune the model
  • Finally we choose Bidirectional LSTM for the Deployment

6. Model Deployment

Deployment Demo

Scope of Improvemment

  • We can always use large dataset which covers almost every type of data for both machine learning and deep learning
  • We can use the best pretrained models but they require a lot of computational power
  • Also there are various ways to increase model accuracy like k-fold cross validation, different data preprocessing techniques better than used here

Conclusion

The Data analysis and modelling was sucessfully done, and the Deep Learning model was deployed on Heroku

Please do ⭐ the repository, if it helped you in anyway.

Releases

No releases published

Packages

No packages published

Languages