Apache/Nginx通过UserAgent屏蔽蜘蛛和采集
发布于 分类 网络营销
更新于 2016-07-17
14天前 有1个用户阅读过
以下是示范,请举一反三,根据实际情况修改。
Apache
#------------------------------------------------------------
# Apache 根据UA,REFERER禁止爬虫
# [G]返回410页面 [F]返回403页面
# 来源:seonoco.com
#------------------------------------------------------------
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} (wget|curl|AhrefsBot|DotBot|MJ12bot|httrack|Findxbot|BLEXBot|WinHttpRequest|Go\s1.1\spackage\shttp|megaindex|BIDUBrowser|FunWebProducts|MSIE\s5|Add\sCatalog|SeznamBot|KomodiaBot|aiHitBot|MojeekBot|PhantomJS|SiteSucker|HTTrack|MegaIndex|BLEXBot|LinkpadBot|Findxbot|SEOkicks|OpenLinkProfiler|PhantomJS|Xenu|007ac9|sistrix|spbot|SiteExplorer|wotbox|ZumBot|ltx71|memoryBot|WBSearchBot|DomainAppender|Python|Aboundex|-crawler|WinHttpRequest|NerdyBot|ZmEu|xovibot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^-$ [NC,OR]
RewriteCond %{HTTP_REFERER} .ru/$ [NC,OR]
RewriteCond %{HTTP_REFERER} (example.com) [NC]
RewriteRule .* - [G]
</IfModule>
Nginx
#------------------------------------------------------------
# Nginx 根据UA禁止爬虫
# 来源:seonoco.com
#------------------------------------------------------------
if ($http_user_agent ~* (wget|curl|AhrefsBot|DotBot|MJ12bot|httrack|Findxbot|BLEXBot|WinHttpRequest|Go\s1.1\spackage\shttp|megaindex|BIDUBrowser|FunWebProducts|MSIE\s5|Add\sCatalog|SeznamBot|KomodiaBot|aiHitBot|MojeekBot|PhantomJS|SiteSucker|HTTrack|MegaIndex|BLEXBot|LinkpadBot|Findxbot|SEOkicks|OpenLinkProfiler|PhantomJS|Xenu|007ac9|sistrix|spbot|SiteExplorer|wotbox|ZumBot|ltx71|memoryBot|WBSearchBot|DomainAppender|Python|Aboundex|-crawler|WinHttpRequest|NerdyBot|ZmEu|xovibot|^$)) {
return 403;
}
-- The End --