zoukankan      html  css  js  c++  java
  • ruby on Httpwatch 脚本

    HTTPwatch官方:http://www.httpwatch.com/rubywatir/

    ruby on httpwatch例子:http://www.httpwatch.com/rubywatir/site_spider.zip (这个例子官网可能更新)

    得到这个例子后做了一些中文注释,对一些代码进行了删减,主要修改内容如下:

    1、在url = gets.chomp!上面添加($*[0].nil?)?(url = url):(url = $*[0]),目前URL可以在命令行加载,也可以在脚本中固定;命令行方式用法:ruby 脚本名 网站名,具体的用法请参看脚本中的注释,说明一下 在URL前面不要添加http://

    2、注视掉了两个break,在ruby186版本没有问题,在ruby192这样的高版本上会有错,需要注视掉

    3、注视掉 plugin.Container.Quit(); 即不退出IE,运行完毕后,测试人员需要去查看结果

    运行时问题:如果测试机网速较低可能出现超时而退出

    C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-cla
    ss.rb:374:in `method_missing': (in OLE method `navigate': ) (WIN32OLERuntimeErro
    r)
        OLE error code:800C000E in <Unknown>
          <No Description>
        HRESULT error code:0x80020009
          发生意外。
            from C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-c
    lassic/ie-class.rb:374:in `goto'
            from C:/Documents and Settings/Administrator/桌面/site_spider/site_spide
    r.rb:55:in `<main>'
    site_spider.rb
      1 # A Site Spider that use HttpWatch, Ruby And Watir
      2 # 
      3 # For more information about this example please refer to http://www.httpwatch.com/rubywatir/
      4 #
      5 MAX_NO_PAGES = 200    #一次访问多少个页面,由MAX_ON_PAGES控制
      6 
      7 require 'win32ole'        # win32ole来驱动HttpWatch工具,HttpWatch6.0以下版本不能调用
      8 require 'rubygems'
      9 require 'watir'
     10 require './url_ops.rb'    # url_ops.rb要放在该脚本的同一目录下
     11 url = "www.gaopeng.com/?ADTAG=beijing_from_beijing"        #要测试的URL,也可以在命令行读取前面不要添加http://
     12 
     13 # Create HttpWatch
     14 control = WIN32OLE.new('HttpWatch.Controller')
     15 httpWatchVer = control.Version
     16 if httpWatchVer[0...1] == "4" or httpWatchVer[0...1] == "5"
     17     puts "\nERROR: You are running HttpWatch #{httpWatchVer}. This sample requires HttpWatch 6.0 or later. Press Enter to exit...";  $stdout.flush
     18     gets
     19     #break        #ruby186版本没有问题,在ruby192这样的高版本上会有错,需要注视掉
     20 end
     21 
     22 # Get the domain name to spider
     23 puts "Enter the domain name of the site to check (press enter for url):\n";  $stdout.flush
     24 ($*[0].nil?)?(url = url):(url = $*[0])  #从命令行传文件名过去,优先读取命令行的
     25 #url = gets.chomp!   #如果添加上面一行的代码,必须注视这一行
     26 if  url.empty? 
     27     url = url
     28 end
     29 hostName =url.HostName
     30 if  hostName.empty? 
     31     puts "\nPlease enter a valid domain name. Press Enter to exit...";  $stdout.flush
     32     gets
     33     #break        #ruby186版本没有问题,在ruby192这样的高版本上会有错,需要注视掉
     34 end
     35 
     36 # 启动IE
     37 ie = Watir::IE.new
     38 ie.logger.level = Logger::ERROR
     39 
     40 # 定位IE窗口
     41 plugin = control.ie.Attach(ie.ie)
     42 
     43 # 开始记录HTTP流量
     44 plugin.Clear()
     45 plugin.Log.EnableFilter(false)
     46 plugin.Record()
     47 
     48 
     49 url = url.CanonicalUrl
     50 urlsVisited = Array.new;  urlsToVisit = Array.new( 1, url )
     51 # 开始访问页面
     52 
     53 while urlsToVisit.length > 0 && urlsVisited.length < MAX_NO_PAGES
     54 
     55     nextUrl= urlsToVisit.pop
     56     puts "Loading " + nextUrl + "...";   $stdout.flush
     57     
     58     ie.goto(nextUrl)            # get WATIR to load URL
     59     urlsVisited.push( nextUrl)    # store this URL in the list that has been visited
     60   
     61   begin
     62     # Look at each link on the page and decide if it needs to be visited
     63     ie.links().each() do |link|
     64         
     65         linkUrl = link.href.CanonicalUrl
     66         # if the url has already been accessed or if it is a download or if it from a different domain
     67         if !url.IsSubDomain( linkUrl.HostName ) ||
     68            linkUrl.Path.include?( ".exe" ) || linkUrl.Path.include?(".zip") || linkUrl.Path.include?(".csv") || 
     69            linkUrl.Path.include?( ".pdf" ) || linkUrl.Path.include?( ".png" ) ||
     70            urlsToVisit.find{ |aUrl| aUrl == linkUrl}  != nil ||
     71            urlsVisited.find{ |aUrl| aUrl == linkUrl}  != nil
     72           # Don't add this URL to the list
     73           next
     74         end
     75         # Add this URL to the list
     76         urlsToVisit.push(linkUrl)
     77       end
     78   rescue
     79     puts "Failed to find links in " + nextUrl + " " + $!;  $stdout.flush
     80   end
     81     
     82 end
     83 
     84 if ( urlsVisited.length == MAX_NO_PAGES )
     85     puts "\nThe spider has stopped because #{MAX_NO_PAGES} pages have been visited. (Change MAX_NO_PAGES if you want to increase this limit)";   $stdout.flush
     86 end
     87 
     88 # Stop Recording HTTP data in HttpWatch
     89 plugin.Stop()
     90 
     91 puts "\nAnalyzing HTTP data..";   $stdout.flush
     92 
     93 
     94 # Look at each HTTP request in the log to compile list of URLs
     95 # for each error
     96 errorUrls = Hash.new
     97 plugin.Log.Entries.each do |entry|
     98     if  !entry.Error.empty? && entry.Error != "Aborted" || entry.StatusCode >= 400
     99         if !errorUrls.has_key?(entry.Result )
    100             errorUrls[entry.Result] =  Array.new( 1, entry.Url  ) 
    101         else
    102             if errorUrls[entry.Result].find{ |aUrl| aUrl == entry.Url } == nil 
    103                 errorUrls[entry.Result].push( entry.Url  )
    104             end             
    105         end
    106     end
    107 end
    108 
    109 # Display summary statistics for whole log
    110 summary = plugin.Log.Entries.Summary
    111 
    112 printf "Total time to load page (secs):      %.3f\n", summary.Time
    113 printf "Number of bytes received on network: %d\n", summary.BytesReceived
    114 
    115 printf "HTTP compression saving (bytes):     %d\n", summary.CompressionSavedBytes
    116 printf "Number of round trips:               %d\n",  summary.RoundTrips
    117 printf "Number of errors:                    %d\n", summary.Errors.Count
    118 
    119 # Print out errors
    120 summary.Errors.each do |error|
    121     numErrors = error.Occurrences
    122     description = error.Description
    123     puts "#{numErrors} URL(s) caused a #{description} error:"
    124     errorUrls[error.Result].each do |aUrl|
    125         puts "-> #{aUrl}"
    126     end
    127 
    128 end
    129 
    130 # 退出IE,这里注释掉,在运行完毕后,测试人员需要去查看结果
    131 #plugin.Container.Quit();
    132 
    133 puts "\r\nPress Enter to exit";  $stdout.flush
    134 #gets
    url_ops.rb
     1 # Helper functions used to parse URLs
     2 class String
     3   def HostName
     4       matches = scan(/^(?:https?:\/\/)?([^\/]*)/)
     5       if matches.length > 0 && matches[0].length > 0
     6          return matches[0][0].downcase
     7       else
     8           return ""
     9       end
    10   end
    11   def IsSubDomain( hostName)
    12     thisHostName = self.HostName
    13     if thisHostName.slice(0..3) == "www."
    14         thisHostName = thisHostName.slice(4..-1)
    15     end
    16     if thisHostName == hostName ||
    17       (hostName.length > thisHostName.length &&
    18        hostName.slice( -thisHostName.length ..-1) == thisHostName)
    19         return true
    20     end
    21     return false
    22   end
    23   def Protocol
    24       matches = scan(/^(https?:\/\/)/)
    25       if matches.length > 0 && matches[0].length > 0
    26           return matches[0][0].downcase
    27       else
    28           return "http://"
    29       end
    30   end  
    31   def Path
    32       if scan(/^(https?:\/\/)/).length > 0 
    33         matches = scan(/^https?:\/\/[^\/]+\/([^#]+)$/)
    34       else
    35         matches = scan(/^[^\/]+\/([^#]+)$/)
    36           end        
    37       if matches != nil && matches.length == 1 && matches[0].length == 1
    38           return matches[0][0].downcase
    39       else
    40           return ""
    41       end
    42   end   
    43   def CanonicalUrl
    44       return self.Protocol + self.HostName + "/" + self.Path
    45   end   
    46 end

    两个脚本放在同一目录下,url_ops.rb未作变动,在cmd中执行即可。

      
  • 相关阅读:
    ArcGIS 10与ArcEngine 10安装及破解
    SQL Server:触发器详解
    sql事务(Transaction)用法介绍及回滚实例
    Brief Tour of the Standard Library
    Python Scopes and Namespaces
    Saving structured data with json
    Packages
    “Compiled” Python files
    Executing modules as scripts
    More on Conditions
  • 原文地址:https://www.cnblogs.com/zhuque/p/2805354.html
Copyright © 2011-2022 走看看