Skip to content

This Python code, uses the Top2Vec library to generate topic models and create word clouds for a given input CSV file containing review data. The code cleans the reviews using the NLTK library and then generates topic models using the Top2Vec library.

Notifications You must be signed in to change notification settings

ozi-dev/Top2Vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Introduction

This Github repository contains a Python code created by Oğuzhan Öztürk. The code uses the Top2Vec library to generate topic models and create word clouds for a given input CSV file containing review data.

Installation

To use this code, first install the required libraries:

!pip install top2vec
!pip install top2vec[sentence_encoders]
!pip install top2vec[sentence_transformers]
!pip install top2vec[indexing]
!pip install numpy==1.23.5

In addition, you will need to import pandas, nltk, and NLTK resources. To download the necessary resources from NLTK, use the following commands:

import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

Usage

After installing the necessary libraries, you can run the code by importing it and specifying the path of your input CSV file:

import pandas as pd 

reviews = pd.read_csv('CSV FILE PATH', on_bad_lines='skip')

The next step is to clean the review text by running the clean_text function:

def clean_text(review):
  le=WordNetLemmatizer()
  word_tokens=word_tokenize(review)
  tokens=[le.lemmatize(w) for w in word_tokens if w not in stop_words and len(w)>3]
  cleaned_text=" ".join(tokens)
  return cleaned_text
  
reviews['review']=reviews['review'].apply(clean_text)

Next, generate the topic models by running the following lines of code:

from top2vec import Top2Vec

model = Top2Vec(list(reviews['review'].to_numpy()), embedding_model='universal-sentence-encoder-large',use_embedding_model_tokenizer=True,split_documents=True)
model.get_num_topics()
topic_sizes, topic_nums = model.get_topic_sizes()

Finally, create the topic word clouds by running the following loop:

for topic in topic_nums:
  model.generate_topic_wordcloud(topic)

Author

This code was created by Oğuzhan Öztürk. For any inquiries or suggestions, please contact the author at oguzhanozturk0@outlook.com.

About

This Python code, uses the Top2Vec library to generate topic models and create word clouds for a given input CSV file containing review data. The code cleans the reviews using the NLTK library and then generates topic models using the Top2Vec library.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published