Skip to content

Bot which searches real estate websites for garage mentions.

License

Notifications You must be signed in to change notification settings

dossma/garagecrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

garagecrawler

Bot which searches real estate websites for mentions of "garage".

It uses a real estate agency network as a starting ground for searching approximately 200 real estate agencies in the Berlin-Brandenburg area of Germany.

Motivation

Find real estate agencies which offer garages within their portfolio or actual offers of houses/apartments with a garage. It is useful if you are looking to buy, rent or sell a garage or if you want to find a housing offer which includes a garage. Furthermore, it can be used as a lead aggregator in case such information may be of interest for your business.

Setup

The bot is designed for going through subpages of the agencies where it can expect the offers or descriptions of the real estate agencies portfolio. It is then looking for the keyword garage mentioned in these sites.

Result

The essential information you get is the target_url entries as here is where the garage keyword was found.

The following data is being listed in a csv spreadsheet:

  • crawling depth
  • referer url
  • referred url (which is target_url)
  • domain of referred url

Adaptability

Generally, the bot can easily be modified for other searches. For that, the search keyword or phrase as well as the start hub aggregator can be exchanged for your desired aim.

Get started

After the development setup has been established (see below), go to the spiders directory and run with

scrapy runspider garagecrawler.py

The result will be saved under garagecrawl-result.csv

Development setup

Required is

pip install scrapy
pip install tldextract

Meta

Author: Jonas Dossmann

Distributed under the AGPL-3.0 license.

https://github.com/dossma/

Releases

No releases published

Packages

No packages published

Languages