Skip to content

realXiaochen/Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Crawler

A small program that

  • crawls a domain.
  • extracts all the pages within the domain to a database.
  • ranks all the pages.

Tool used

  • python (urllib, Beautifulsoup)
  • sqlite

Algorithm used

  • crawling (spider.py): deep first search
  • ranking(pagerank.py): PageRank

Releases

No releases published

Packages

No packages published

Languages