Skip to content

Search Engine for Books (Java, Apache Lucene, crawler4j, Apache Spark)

Notifications You must be signed in to change notification settings

chanddu/Book-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Book-Search-Engine

Search Engine for Books (Java, Apache Lucene, crawler4j, Apache Spark)

  • Crawled about 100,000 web pages using crawler4j and performed link analysis by implementing PageRank on the web graph with Apache Spark’s Graphx.
  • Indexed the crawled documents using Apache Lucene and ordered the documents for each query by a combination of PageRank and TF/IDF score.