Skip to content

Python implementation of TextRank algorithm (https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) for automatic keyword extraction and summarization using Levenshtein distance as relation between text units.

Notifications You must be signed in to change notification settings

cmanallen/TextRank

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf. However, this implementation uses Levenshtein Distance as the relation between text units.

This implementation carries out automatic keyword and sentence extraction on 10 articles gotten from http://theonion.com

  • 100 word summary
  • Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
  • Adjacent keywords in the text are concatenated into keyphrases

Dependencies

Networkx - http://networkx.github.io/download.html NLTK 3.0 - http://nltk.org/install.html Numpy - http://sourceforge.net/projects/numpy/files/

About

Python implementation of TextRank algorithm (https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) for automatic keyword extraction and summarization using Levenshtein distance as relation between text units.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%