Skip to content

Practical demonstration of scikit learn library for building various classification and regression models

Notifications You must be signed in to change notification settings

ankit013/Projects-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Projects-Python

Practical demonstration of scikit learn library for building various classification and regression models

Description

The ultimate goal of topic modeling is to find various topics that are present in your corpus. Each document in the corpus will be made up of at least one topic, if not multiple topics. In this notebook, we will be covering the steps on how to do Latent Dirichlet Allocation (LDA), which is one of many topic modeling techniques. It was specifically designed for text data. To use a topic modeling technique, you need to provide (1) a document-term matrix and (2) the number of topics you would like the algorithm to pick up. Once the topic modeling technique is applied, your job as a human is to interpret the results and see if the mix of words in each topic make sense. If they don't make sense, you can try changing up the number of topics, the terms in the document-term matrix, model parameters, or even try a different model.

Data set comprises of 20 Newsgroups and using LDA to extract the naturally discussed topics.

Using Latent Dirichlet Allocation (LDA) from Gensim package along with the Mallet’s implementation (via Gensim). Mallet has an efficient implementation of the LDA. It is known to run faster and gives better topics segregation.

Data set

Data can be obtained from : https://raw.githubusercontent.com/selva86/datasets/master/newsgroups.json

About

Practical demonstration of scikit learn library for building various classification and regression models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published