zoukankan      html  css  js  c++  java
  • python2.7 urllib2 爬虫

     # _*_ coding:utf-8 _*_

    import urllib2
    import cookielib
    import random
    import re
    from bs4 import BeautifulSoup
    import datetime

    dax = datetime.datetime.now().strftime('%Y-%m-%d')
    print(dax)

    url = 'http://ww=singlemessage&isappinstalled=0'

    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    urllib2.install_opener(opener)
    request = urllib2.Request(url)
    headers = [
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)',
    'Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0',
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'
    ]

    hds = random.choice(headers)
    # print(hds)
    request.add_header('User-Agent','%s' % hds)
    #response = urllib2.urlopen("http://www.hn1m=singlemessage&isappinstalled=0")
    response = urllib2.urlopen(request)
    cont = response.read()
    #print(cont)

    soup = BeautifulSoup(cont,'html.parser',from_encoding='utf-8')
    # print(soup)
    # listyj = soup.find_all('dl')
    # for listyjx in listyj:
    # print(listyjx.name,listyjx.attrs,listyjx.gettext())
    # # if dax in listyjx:
    # # print(listyjx)

  • 相关阅读:
    WCF添加服务失败。服务元数据可能无法访问。请确保服务正在运行并且正在公开元数据。
    【C#】 实现WinForm中只能启动一个实例
    centos7防火墙问题
    ftp搭建记录
    centos7常用命令
    RocketMQ部署
    mongedb主从
    redis 主从复制+读写分离+哨兵
    keepalive+nginx
    分布架构分析
  • 原文地址:https://www.cnblogs.com/ruiy/p/9193940.html
Copyright © 2011-2022 走看看