Skip to content

UniversityCeNotification/Crawling-Rss-Xpath-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawling-Rss-Bot

Crawling rss or site, and inserting mongodb or writing file or pushing redis.

If your site does not have rss, you should write xpath

How to try this repo

  $ # Just run init.sh file
  $ bash init.sh          # Default, Crawler run every 60 second
  $ bash init.sh 10       # Crawler run every 10 second

Example Json

{
  "Site": "Yildiz Teknik University",
  "SiteLink": "https://ytuce.maliayas.com/",
  "SiteRssLink": "https://ytuce.maliayas.com/?type=rss",
  "ListXpath": "//div[@class='text_title']",
  "UrlXpath": "a/@href",
  "TitleXpath": "a/text()"
}

University List or Site

University Crawling Site Status
Yildiz Technical https://ytuce.maliayas.com/?type=rss Ok
Istanbul http://ce.istanbul.edu.tr/ Nope
Pamukkale http://www.pamukkale.edu.tr/bilgisayar WIP
Istanbul Technical http://www.bb.itu.edu.tr/ Nope
Anadolu https://anadolu.edu.tr/duyurular Nope
Reddit Python https://www.reddit.com/r/Python/.rss Ok