Skip to content

KUANCHENGFU/News-Article-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

News Article Clustering

Overview

The goal of this project is to develop a framework which is capable of clustering news articles on the basis of their text contents. Several techniques such as TFIDF, cosine similarity, truncated SVD, and k-means clustering are applied to this project. This project is basically composed of four parts as follows:

  1. Process and tokenize the news articles
  2. Build a sparse TF-IDF matrix from all terms of the news articles
  3. Perform dimensionality reduction using truncated SVD
  4. Cluster the news articles using k-means clustering

Link to the Project

  1. https://nbviewer.jupyter.org/github/KUANCHENGFU/News-Article-Clustering/blob/main/News_article_clustering.ipynb