Skip to content

devidw/google-untitled-spam-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google 'Untitled' Spam Spider

A tiny web spider that starts crawling a website and crawls as long as it can find links on those pages, which links to similar spam pages.

This spider is targeting the 'Untitled' spam pages from the Google search results.

I wrote several articles about those spam pages. In which I discuss the underlying backgrounds of this spam network.

I crawled 105,009 Google 'Untitled' Spam Pages in 7 days and 700,504 other linked Spam Pages
— David Wolf
david.wolf.gdn

Usage

from google_spam_spider import GoogleSpamSpider

spider = GoogleSpamSpider(
    url='http://zone-casino.fr/2hephe/torch-functional-unfold.html', # The url to start crawling
    direct_spam_logs='direct_spam.log', # The file to log direct spam
    external_spam_logs='external_spam.log' # The file to log external spam
    )