Skip to content

Java parser for the "Reuters-21578, Distribution 1.0" Text Categorization data set.

License

Notifications You must be signed in to change notification settings

AltA-Advisory/ReutersParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReutersParser

Java parser for the "Reuters-21578, Distribution 1.0" Text Categorization data set.

Download the dataset and extract the files to any directory.
https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection

You can then iterate over the articles in the data set as follows:

File file = new File("/tmp/Reuters");
ReutersParser p = new ReutersParser(file);
for(ReutersArticle a : p){
    String title = a.getTag("TITLE");
    String body = a.getTag("BODY");
}

About

Java parser for the "Reuters-21578, Distribution 1.0" Text Categorization data set.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages