zoukankan      html  css  js  c++  java
  • PHP判断是不是爬虫的方法

    PHP判断是不是爬虫的方法
    这个一般用于防止爬虫 和 seo优化(因为爬虫都是按照第一次打开显示的页面 有些ajax 等需要点击才能显示的就爬不到啦)
    <pre>
    <?php
    // 判断是否搜索引擎机器人访问
    function isRobot() {
    $agent= strtolower(isset($_SERVER['HTTP_USER_AGENT'])? $_SERVER['HTTP_USER_AGENT'] : '');
    if(!empty($agent)){
    $spiderSite= array(
    "TencentTraveler",
    "Baiduspider+",
    "BaiduGame",
    "Googlebot",
    "msnbot",
    "Sosospider+",
    "Sogou web spider",
    "ia_archiver",
    "Yahoo! Slurp",
    "YoudaoBot",
    "Yahoo Slurp",
    "MSNBot",
    "Java (Often spam bot)",
    "BaiDuSpider",
    "Voila",
    "Yandex bot",
    "BSpider",
    "twiceler",
    "Sogou Spider",
    "Speedy Spider",
    "Google AdSense",
    "Heritrix",
    "Python-urllib",
    "Alexa (IA Archiver)",
    "Ask",
    "Exabot",
    "Custo",
    "OutfoxBot/YodaoBot",
    "yacy",
    "SurveyBot",
    "legs",
    "lwp-trivial",
    "Nutch",
    "StackRambler",
    "The web archive (IA Archiver)",
    "Perl tool",
    "MJ12bot",
    "Netcraft",
    "MSIECrawler",
    "WGet tools",
    "larbin",
    "Fish search",
    );
    foreach($spiderSite as $val){
    $str = strtolower($val);
    if(strpos($agent, $str) !== false){
    return true;
    }
    }
    }

    return false;
    }
    if(isRobot()){
    echo'爬虫';
    }else{
    echo'不是爬虫';
    }
    ?>
    </pre>

  • 相关阅读:
    nginx+uwsgi部署Django
    Git----忽略特殊文件
    Git 分支管理
    Django admin 页面中文名称加s,去除s的设置
    hive-sql参数调优及资源分配
    常用数仓架构/计算引擎
    maven 打包可运行jar包(转)
    spark sql遇到的问题
    分析跨域
    nio案例一:个简单的客户-服务的案例
  • 原文地址:https://www.cnblogs.com/newmiracle/p/11864761.html
Copyright © 2011-2022 走看看