Python3爬虫获取

zoukankan html css js c++ java

Python3爬虫获取

  “AttributeError: 'module' object has no attribute 'urlopen'”

原因是Python3里的urllib模块已经发生改变，此处的urllib都应该改成urllib.request。

修改之后再运行，发现又有如下提示：

    TypeError: can't use a string pattern on a bytes-like object

原因为Python3 findall数据类型用bytes类型，因此在正则表达式前应添加html = html.decode('utf-8')。

修改完后运行，成功~，不过由于网站原因，仍只能保存最近的24张背景图。最终代码如下：

#!/usr/bin/env python

# -*- coding:utf-8 -*-

# -*- author:arron ni-*-

# python3抓取bing主页所有背景图片

import urllib,re,sys,os

def get_bing_backphoto():

    if (os.path.exists('photos')== False):

        os.mkdir('photos')

    for i in range(0,30):

        url = 'http://cn.bing.com/HPImageArchive.aspx?format=js&idx='+str(i)+'&n=1&nc=1361089515117&FORM=HYLH1'

        html = urllib.request.urlopen(url).read()

        if html == 'null':

            print( 'open & read bing error!')

            sys.exit(-1)

        html = html.decode('utf-8')

        reg = re.compile('"url":"(.*?)","urlbase"',re.S)

        text = re.findall(reg,html)

        #http://s.cn.bing.net/az/hprichbg/rb/LongJi_ZH-CN8658435963_1366x768.jpg

        for imgurl in text :

            right = imgurl.rindex('/')

            name = imgurl.replace(imgurl[:right+1],'')

            savepath = 'photos/'+ name

            urllib.request.urlretrieve(imgurl, savepath)

            print (name + ' save success!')
get_bing_backphoto()

查看全文

相关阅读:
zoj3888 找第二大
 zoj3882 博弈
 字典树小总结
 hdu2222 字典树
 hdu1247 字典树
 开放融合 | “引擎级”深度对接！POLARDB与SuperMap联合构建首个云原生时空平台
 阿里HBase高可用8年“抗战”回忆录
 最佳实践 | RDS & POLARDB归档到X-Pack Spark计算
 今日头条在消息服务平台和容灾体系建设方面的实践与思考
 饿了么监控系统 EMonitor 与美团点评 CAT 的对比

原文地址：https://www.cnblogs.com/zfquan/p/8057672.html