A WebCrawler using efficient LinkedList ADT to store Deep Page links in an N Tree
Run it like:
python dance.py http://www.facebook.com <depth level of tree> <max links>
python dance.py http://www.facebook.com 2 40
Running len(result) and nTree.scanSize() comparison yields good Tree scanning time:
start = time.time()
print len(result)
print "len time:{}".format(time.time()-start)
start = time.time()
print root.scanSize()
print "scanSize time:{}".format(time.time()-start)
In the print screen below, the format is like this:
This is all without optimization. Cythonizing variables C style usually yields speed gains 10x. using numpy for contingent arrays may increase retrieval time and overall speed.
- Clean code
- Add inline doc
- Replace result with numpy array
- Cythonze helper libs
- Use something like joblib to parallelize parsing in parseurl Using