Skip to content

AshishKempwad/Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Clustering

Knn from scratch

Task:

  1. Given a dataset of documents with content from 5 different fields ( namely busi- ness, entertainment, politics, sport, and tech ), cluster them using any clustering algorithm of your choice.
  2. Do not use any libraries for this part. You are expected to code your clustering algorithm from scratch.
  3. For feature extraction you can use the vectorizers provided by sklearn or by using the pre trained embeddings. ( Code snippet for the usage of these embeddings has been provided in the previous question ).
  4. You might have to perform some pre-processing on the raw documents before you apply your algorithm.
  5. We have provided ground truth document tags for the documents. Report accuracy score on these documents.
  6. We will test your score on the documents for which the tags have not been provided.
  7. In the dataset, the number after the ’ ’ symbol in the file name denotes the cluster label.