Using the graphic interface of munin, we can check how many requests our platform received from crawlers
Custom Values
- limitloglines number of lines like tail -n
- botstrings array of strings to get bots
- listuseragents array of bots to classify
For Apache Logs
LogFormat "{ \"time\":\"%t\", \"remoteIP\":\"%{X-Forwarded-For}i\", \"host\":\"%V\", \
"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":
\"%{User-agent}i\", \"referer\":\"%{Referer}i\", \"req_time\":\"%D\" }" jsonlog
SetEnvIf X-Forwarded-For "^.*\..*\..*\..*" forwarded
CustomLog "logs/access.log" jsonlog env=forwarded
For munin-node config
[munin-crawler]
group root
env.file_name /var/log/httpd/access.log
Install
cd /etc/munin/plugins
ln -s /usr/share/munin/plugins/munin-crawler.py munin-crawler
To Know what is the value of limitloglines
cat /var/log/varnish/varnishncsa.log | jsonlog.php -t | cut -d" " -f1 | cut -d"/" -f3 | cut -d":" -f2,3 | sort | uniq -c | sort -n -k1
Screen