zoukankan html css js c++ java

下载网页超时处理

>>require "open-uri"
>>open("http://www.cnblog.org/blog/atom.xml")

但是，这个方法的缺点是太简单，无法设置超时时间。在超时的情况下，他会无限的请求下去，直到达到了默认的超时时间，这个时间很长

http://www.renren.it/a/bianchengyuyan/Ruby/20101023/40036.html

了保险起见，在要考虑超时处理或者其他设定的情况下，还是使用Net::HTTP
除了能设置超时时间之外，还能设置其他的请求参数，例如user-agent

这个user-agent还是很有用的参数，先前在拿163.com做实验的时候，没有设个参数，结果老是重定向，把这个请求当做了手机端的

class HandleGetRequest
  # 对url发起get请求
  require 'net/http'

  def self.get_response(url)
    begin
      url_str = URI.parse(url)
      site = Net::HTTP.new(url_str.host, url_str.port)
      site.open_timeout = 20
      site.read_timeout = 20
      path = url_str.query.blank? ? url_str.path : url_str.path+"?"+url_str.query
      return site.get2(path,{'accept'=>'text/html','user-agent'=>'Mozilla/5.0'})
    rescue Exception => ex
      p ex
    end
  end

end

请求一个正常的网址

>> HandleGetRequest.get_response("http://www.javaeye.com/topic/431217")
=> #<Net::HTTPOK 200 OK readbody=true>

请求一个超时的网址（在我机器上测试时超时的），会在设定的时间到达时抛出异常

>> HandleGetRequest.get_response("http://www.cnblog.org/blog/atom.xml")
#<Timeout::Error: execution expired>
Timeout::Error: execution expired
        from /usr/local/bin/rubyee/lib/ruby/1.8/timeout.rb:60:in `open'
        from /usr/local/bin/rubyee/lib/ruby/1.8/net/http.rb:560:in `connect'
        from /usr/local/bin/rubyee/lib/ruby/1.8/net/http.rb:560:in `connect'
        from /usr/local/bin/rubyee/lib/ruby/1.8/net/http.rb:553:in `do_start'
        from /usr/local/bin/rubyee/lib/ruby/1.8/net/http.rb:542:in `start'
        from /usr/local/bin/rubyee/lib/ruby/1.8/net/http.rb:1035:in `request'
        from /usr/local/bin/rubyee/lib/ruby/1.8/net/http.rb:948:in `get2'
        from /home/chengliwen/chengliwen/deploy/pin-macro-tmp/lib/handle_get_request.rb:30:in `get_response'
        from (irb):1

然后可以根据响应值，去处理response的body了

处理gzip压缩网页

05-26

用ruby抓取gzip网页

require 'net/http'
require 'uri'
module Net
class HTTP
    def HTTP.get_with_headers(uri,headers=nil)
      uri=URI.parse(uri) if uri.respond_to? :to_str
      start(uri.host,uri.port) do |http|
        return http.get(uri.path,headers)
      end
    end
end
end

gzipped = Net::HTTP.get_with_headers('http://www.qidian.com/','Accept-Encoding' => 'gzip')
puts gzipped.body.size
require 'zlib'
require 'stringio'
body_io=StringIO.new(gzipped.body)
unzipped_body=Zlib::GzipReader.new(body_io).read
puts unzipped_body

查看全文

相关阅读:
Warning:mailcious javascript detected on this domain来由
 CSS盒模型重新理解篇
 sublime生产力提升利器
 Aptana studio 3前端开发编辑器推荐
 Provides PHP completions for Sublime Text
关于google电子地图跟卫星地图位置不重合
 无名前端库
 npm 编写cli
webpack.merge
ExcelDNA UDF 攻略

原文地址：https://www.cnblogs.com/lexus/p/1936667.html