zoukankan      html  css  js  c++  java
  • 添加headers头文件反爬虫

    import urllib2
    import re

    def geturl():
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0'}
    request = urllib2.Request("http://www.69shu.com/19251/",headers = headers)
    html=urllib2.urlopen(request).read()
    html=html.decode('gbk')
    html=html.encode('utf-8')
    print html
    geturl()

    ip代理

    1
    2
    3
    4
    5
    6
    7
    8
    9
    import urllib2
    enable_proxy = True
    proxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})
    null_proxy_handler = urllib2.ProxyHandler({})
    if enable_proxy:
        opener = urllib2.build_opener(proxy_handler)
    else:
        opener = urllib2.build_opener(null_proxy_handler)
    urllib2.install_opener(opener)
    
    
  • 相关阅读:
    循环神经网络(Recurrent Neural Network)
    特征选择
    程序猿能挣多少钱
    python socket
    python 2 encode and decode
    pandas series
    source collection list
    pep8摘要
    python 正则表达式
    django显示图片
  • 原文地址:https://www.cnblogs.com/ZHANG576433951/p/6078252.html
Copyright © 2011-2022 走看看