zoukankan      html  css  js  c++  java
  • Googlebot (Google Web search)

    w推测“域名解析过程中,Google crawlers中首先是Googlebo中的Google Web search上阵”。

      1 +-----+----------------+---------------------+-------------------------+------------------+
      2 |  23 | 111.251.93.170 | 2017-01-24 17:48:19 | Unidentified User Agent |                  |
      3 |  24 | 111.251.93.170 | 2017-01-24 17:49:19 | Unidentified User Agent |                  |
      4 |  51 | 119.147.32.253 | 2017-01-24 17:59:32 | Unidentified User Agent |                  |
      5 |  53 | 183.57.53.197  | 2017-01-24 18:11:56 | Mozilla 5.0             | iOS              |
      6 |  54 | 123.56.233.103 | 2017-01-24 18:14:39 | Unidentified User Agent |                  |
      7 |  56 | 112.90.142.207 | 2017-01-24 18:18:05 | Firefox 3.0             | Windows XP       |
      8 |  57 | 183.232.120.37 | 2017-01-24 18:18:05 | Firefox 3.0             | Windows XP       |
      9 |  59 | 117.136.40.218 | 2017-01-24 18:18:47 | ZTE                     | Android          |
     10 |  60 | 117.136.40.218 | 2017-01-24 18:18:50 | ZTE                     | Android          |
     11 |  61 | 117.136.40.218 | 2017-01-24 18:18:51 | ZTE                     | Android          |
     12 |  62 | 117.136.40.218 | 2017-01-24 18:18:53 | ZTE                     | Android          |
     13 |  63 | 117.136.40.218 | 2017-01-24 18:19:00 | Safari 534.30           | Android          |
     14 |  64 | 117.136.40.218 | 2017-01-24 18:19:13 | Safari 534.30           | Android          |
     15 |  65 | 117.136.40.218 | 2017-01-24 18:29:31 | Chrome 37.0.0.0         | Android          |
     16 |  66 | 117.136.40.218 | 2017-01-24 18:29:41 | Chrome 37.0.0.0         | Android          |
     17 |  67 | 117.136.40.218 | 2017-01-24 18:30:02 | Chrome 37.0.0.0         | Android          |
     18 |  68 | 117.136.40.218 | 2017-01-24 18:30:15 | Chrome 37.0.0.0         | Android          |
     19 |  69 | 117.136.40.218 | 2017-01-24 18:40:37 | Chrome 55.0.2883.87     | Windows 7        |
     20 |  70 | 177.193.53.212 | 2017-01-24 18:47:00 | Googlebot               | Unknown Platform |
     21 |  71 | 111.251.93.170 | 2017-01-24 18:49:26 | Unidentified User Agent |                  |
     22 |  72 | 139.162.108.53 | 2017-01-24 19:05:15 | Chrome 50.0.2661.102    | Windows 10       |
     23 |  73 | 111.251.93.170 | 2017-01-24 19:08:52 | Unidentified User Agent |                  |
     24 |  74 | 111.251.93.170 | 2017-01-24 19:09:40 | Unidentified User Agent |                  |
     25 |  75 | 111.251.93.170 | 2017-01-24 19:29:51 | Unidentified User Agent |                  |
     26 |  76 | 61.142.176.19  | 2017-01-24 19:46:40 | Firefox 3.6.3           | Windows 7        |
     27 |  77 | 111.251.93.170 | 2017-01-24 19:49:40 | Unidentified User Agent |                  |
     28 |  78 | 111.251.93.170 | 2017-01-24 19:50:49 | Unidentified User Agent |                  |
     29 |  79 | 111.251.93.170 | 2017-01-24 20:09:52 | Unidentified User Agent |                  |
     30 |  80 | 111.251.93.170 | 2017-01-24 20:30:06 | Unidentified User Agent |                  |
     31 |  81 | 23.251.63.45   | 2017-01-24 20:37:14 | Unidentified User Agent |                  |
     32 |  82 | 111.251.93.170 | 2017-01-24 20:49:53 | Unidentified User Agent |                  |
     33 |  83 | 111.251.93.170 | 2017-01-24 21:10:04 | Unidentified User Agent |                  |
     34 |  84 | 111.251.93.170 | 2017-01-24 21:30:32 | Unidentified User Agent |                  |
     35 |  85 | 111.251.93.170 | 2017-01-24 21:50:46 | Unidentified User Agent |                  |
     36 |  86 | 111.251.93.170 | 2017-01-24 21:51:33 | Unidentified User Agent |                  |
     37 |  87 | 61.142.176.20  | 2017-01-24 21:58:34 | Unidentified User Agent | Unknown Platform |
     38 |  88 | 111.251.93.170 | 2017-01-24 22:11:24 | Unidentified User Agent |                  |
     39 |  89 | 111.251.93.170 | 2017-01-24 22:30:22 | Unidentified User Agent |                  |
     40 |  90 | 111.251.93.170 | 2017-01-24 22:31:24 | Unidentified User Agent |                  |
     41 |  91 | 23.251.63.45   | 2017-01-24 22:41:58 | Unidentified User Agent |                  |
     42 |  92 | 111.251.93.170 | 2017-01-24 22:50:40 | Unidentified User Agent |                  |
     43 |  93 | 111.251.93.170 | 2017-01-24 23:31:12 | Unidentified User Agent |                  |
     44 |  94 | 111.251.93.170 | 2017-01-24 23:32:00 | Unidentified User Agent |                  |
     45 |  95 | 111.251.93.170 | 2017-01-24 23:32:40 | Unidentified User Agent |                  |
     46 |  96 | 111.251.93.170 | 2017-01-24 23:51:21 | Unidentified User Agent |                  |
     47 |  97 | 111.251.93.170 | 2017-01-25 00:11:27 | Unidentified User Agent |                  |
     48 |  98 | 111.251.93.170 | 2017-01-25 00:12:45 | Unidentified User Agent |                  |
     49 |  99 | 111.251.93.170 | 2017-01-25 00:13:50 | Unidentified User Agent |                  |
     50 | 100 | 111.251.93.170 | 2017-01-25 00:14:47 | Unidentified User Agent |                  |
     51 | 101 | 111.251.93.170 | 2017-01-25 00:16:26 | Unidentified User Agent |                  |
     52 | 102 | 111.251.93.170 | 2017-01-25 00:31:19 | Unidentified User Agent |                  |
     53 | 103 | 111.251.93.170 | 2017-01-25 01:11:45 | Unidentified User Agent |                  |
     54 | 104 | 111.251.93.170 | 2017-01-25 01:31:54 | Unidentified User Agent |                  |
     55 | 105 | 23.251.63.45   | 2017-01-25 01:48:22 | Unidentified User Agent |                  |
     56 | 106 | 111.251.93.170 | 2017-01-25 02:12:40 | Unidentified User Agent |                  |
     57 | 107 | 111.251.93.170 | 2017-01-25 02:33:18 | Unidentified User Agent |                  |
     58 | 108 | 111.251.93.170 | 2017-01-25 02:34:48 | Unidentified User Agent |                  |
     59 | 109 | 111.251.93.170 | 2017-01-25 02:35:53 | Unidentified User Agent |                  |
     60 | 110 | 111.251.93.170 | 2017-01-25 02:37:17 | Unidentified User Agent |                  |
     61 | 111 | 111.251.93.170 | 2017-01-25 02:43:16 | Unidentified User Agent |                  |
     62 | 112 | 111.251.93.170 | 2017-01-25 02:46:22 | Unidentified User Agent |                  |
     63 | 113 | 111.251.93.170 | 2017-01-25 02:48:32 | Unidentified User Agent |                  |
     64 | 114 | 111.251.93.170 | 2017-01-25 02:51:58 | Unidentified User Agent |                  |
     65 | 115 | 111.251.93.170 | 2017-01-25 03:01:26 | Unidentified User Agent |                  |
     66 | 116 | 111.251.93.170 | 2017-01-25 03:16:49 | Unidentified User Agent |                  |
     67 | 117 | 111.251.93.170 | 2017-01-25 03:22:45 | Unidentified User Agent |                  |
     68 | 118 | 111.251.93.170 | 2017-01-25 03:26:47 | Unidentified User Agent |                  |
     69 | 119 | 111.251.93.170 | 2017-01-25 03:33:23 | Unidentified User Agent |                  |
     70 | 120 | 111.251.93.170 | 2017-01-25 03:43:50 | Unidentified User Agent |                  |
     71 | 121 | 111.251.93.170 | 2017-01-25 03:49:33 | Unidentified User Agent |                  |
     72 | 122 | 111.251.93.170 | 2017-01-25 03:53:22 | Unidentified User Agent |                  |
     73 | 123 | 111.251.93.170 | 2017-01-25 03:58:46 | Unidentified User Agent |                  |
     74 | 124 | 111.251.93.170 | 2017-01-25 04:06:35 | Unidentified User Agent |                  |
     75 | 125 | 111.251.93.170 | 2017-01-25 04:08:54 | Unidentified User Agent |                  |
     76 | 126 | 111.251.93.170 | 2017-01-25 04:17:26 | Unidentified User Agent |                  |
     77 | 127 | 111.251.93.170 | 2017-01-25 04:21:49 | Unidentified User Agent |                  |
     78 | 128 | 111.251.93.170 | 2017-01-25 04:25:36 | Unidentified User Agent |                  |
     79 | 129 | 111.251.93.170 | 2017-01-25 04:31:20 | Unidentified User Agent |                  |
     80 | 130 | 111.251.93.170 | 2017-01-25 04:39:50 | Unidentified User Agent |                  |
     81 | 131 | 111.251.93.170 | 2017-01-25 04:46:16 | Unidentified User Agent |                  |
     82 | 132 | 111.251.93.170 | 2017-01-25 05:00:27 | Unidentified User Agent |                  |
     83 | 133 | 111.251.93.170 | 2017-01-25 05:05:55 | Unidentified User Agent |                  |
     84 | 134 | 111.251.93.170 | 2017-01-25 05:20:32 | Unidentified User Agent |                  |
     85 | 135 | 111.251.93.170 | 2017-01-25 05:23:52 | Unidentified User Agent |                  |
     86 | 136 | 111.251.93.170 | 2017-01-25 05:30:00 | Unidentified User Agent |                  |
     87 | 137 | 111.251.93.170 | 2017-01-25 05:44:46 | Unidentified User Agent |                  |
     88 | 138 | 111.251.93.170 | 2017-01-25 05:50:59 | Unidentified User Agent |                  |
     89 | 139 | 111.251.93.170 | 2017-01-25 05:54:41 | Unidentified User Agent |                  |
     90 | 140 | 23.251.63.45   | 2017-01-25 05:58:54 | Unidentified User Agent |                  |
     91 | 141 | 111.251.93.170 | 2017-01-25 06:14:16 | Unidentified User Agent |                  |
     92 | 142 | 111.251.93.170 | 2017-01-25 06:26:27 | Unidentified User Agent |                  |
     93 | 143 | 111.251.93.170 | 2017-01-25 06:32:40 | Unidentified User Agent |                  |
     94 | 144 | 111.251.93.170 | 2017-01-25 06:40:17 | Unidentified User Agent |                  |
     95 | 145 | 111.251.93.170 | 2017-01-25 06:53:45 | Unidentified User Agent |                  |
     96 | 146 | 111.251.93.170 | 2017-01-25 06:58:59 | Unidentified User Agent |                  |
     97 | 147 | 125.39.207.33  | 2017-01-25 07:05:01 | Unidentified User Agent | Unknown Platform |
     98 | 148 | 111.251.93.170 | 2017-01-25 07:11:58 | Unidentified User Agent |                  |
     99 | 149 | 111.251.93.170 | 2017-01-25 07:19:30 | Unidentified User Agent |                  |
    100 | 150 | 183.60.48.110  | 2017-01-25 07:24:55 | Unidentified User Agent | Unknown Platform |
    101 | 151 | 111.251.93.170 | 2017-01-25 07:25:34 | Unidentified User Agent |                  |
    102 | 152 | 111.251.93.170 | 2017-01-25 07:28:56 | Unidentified User Agent |                  |
    103 | 153 | 111.251.93.170 | 2017-01-25 07:35:52 | Unidentified User Agent |                  |
    104 | 154 | 111.251.93.170 | 2017-01-25 07:43:21 | Unidentified User Agent |                  |
    105 | 155 | 111.251.93.170 | 2017-01-25 07:48:11 | Unidentified User Agent |                  |
    106 | 156 | 101.226.51.229 | 2017-01-25 07:57:36 | Chrome 45.0.2454.101    | Windows XP       |
    107 | 157 | 111.251.93.170 | 2017-01-25 08:02:04 | Unidentified User Agent |                  |
    108 | 158 | 111.251.93.170 | 2017-01-25 08:08:18 | Unidentified User Agent |                  |
    109 | 159 | 111.251.93.170 | 2017-01-25 08:16:22 | Unidentified User Agent |                  |
    110 | 160 | 111.251.93.170 | 2017-01-25 08:22:15 | Unidentified User Agent |                  |
    111 | 161 | 111.251.93.170 | 2017-01-25 08:31:19 | Unidentified User Agent |                  |
    112 | 162 | 111.251.93.170 | 2017-01-25 08:36:05 | Unidentified User Agent |                  |
    113 | 163 | 111.251.93.170 | 2017-01-25 08:43:38 | Unidentified User Agent |                  |
    114 | 164 | 111.251.93.170 | 2017-01-25 08:59:11 | Unidentified User Agent |                  |
    115 | 165 | 111.251.93.170 | 2017-01-25 09:07:05 | Unidentified User Agent |                  |
    116 | 166 | 111.251.93.170 | 2017-01-25 09:11:57 | Unidentified User Agent |                  |
    117 +-----+----------------+---------------------+-------------------------+------------------+
    118  

    https://support.google.com/webmasters/answer/1061943?hl=en

    Google crawlers

    See which robots Google uses to crawl the web

    "Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

    CrawlerUser agent tokenFull user agent string (as seen in website log files)
    Googlebot (Google Web search) Googlebot Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    or
    (rarely used): Googlebot/2.1 (+http://www.google.com/bot.html)
    Googlebot News Googlebot-News
    (Googlebot)
    Googlebot-News
    Googlebot Images Googlebot-Image
    (Googlebot)
    Googlebot-Image/1.0
    Googlebot Video Googlebot-Video
    (Googlebot)
    Googlebot-Video/1.0
    Google Smartphone Googlebot

    Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    Google Mobile AdSense Mediapartners-Google

    or

    Mediapartners
    (Googlebot)
    [various mobile device types] (compatible; Mediapartners-Google/2.1+http://www.google.com/bot.html)
    Google AdSense Mediapartners-Google
    Mediapartners
    (Googlebot)
    Mediapartners-Google
    Google AdsBot landing page quality check AdsBot-Google AdsBot-Google (+http://www.google.com/adsbot.html)

    Google app crawler

    (Used to fetch resources for mobile apps, obeys AdsBot-Google robots rules.)

    AdsBot-Google-Mobile-Apps AdsBot-Google-Mobile-Apps

    robots.txt

    Where several user-agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don't need a robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing some of your content, you can do this by specifying Googlebot as the user-agent. For example, if you want all your pages to appear in Google search, and if you want AdSense ads to appear on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the user-agent Googlebot will also block all Google's other user-agents.

    But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don't want images in your personal directory to be crawled. In this case, use robots.txt to disallow the user-agent Googlebot-image from crawling the files in your /personal directory (while allowing Googlebot to crawl all files), like this:

    User-agent: Googlebot
    Disallow:
    
    User-agent: Googlebot-Image
    Disallow: /personal
    
    To take another example, say that you want ads on all your pages, but you don't want those pages to appear in Google Search. Here, you'd block Googlebot, but allow Mediapartners-Google, like this:
    User-agent: Googlebot
    Disallow: /
    
    User-agent: Mediapartners-Google
    Disallow:
    

    robots meta tag

    Some pages use multiple robots meta tags to specify directives for different crawlers, like this:

    <meta name="robots" content="nofollow"><meta name="googlebot" content="noindex">
    

    In this case, Google will use the sum of the negative directives, and Googlebot will follow both the noindex and nofollow directives. More detailed information about controlling how Google crawls and indexes your site.

  • 相关阅读:
    如何简化你的Vuex Store
    深入理解React中的setState
    vue双向绑定原理分析
    vue递归组件:树形控件
    Vue 3.0 的 Composition API 尝鲜
    React Native 与 Flutter 的跨平台之争
    javascript 变量赋值和 参数传递
    setTimeout 和 throttle 那些事儿
    一道面试题-变量声明提升~
    匹配文件扩展名两种方式
  • 原文地址:https://www.cnblogs.com/rsapaper/p/6349067.html
Copyright © 2011-2022 走看看