zoukankan      html  css  js  c++  java
  • 下载某个页面中的图片

    用到了BeautifulSoup这个库,需要先下载安装.下载地址http://www.crummy.com/software/BeautifulSoup/

    config.py

    1 url = "http://www.baidu.com"
    2 folder = "d:\test"
    View Code

    downloadPictrues.py

    import config
    from bs4 import BeautifulSoup
    import urlparse
    from urllib2 import urlopen
    from urllib import urlretrieve
    import os
    
    ###########################################
    #to resolve the fucking character encoding problem
    import sys
    reload(sys)
    sys.setdefaultencoding('utf8')
    
    def main(url, out_folder):
        """Downloads all the images at 'url' to out_folder"""
        pageFile =  urlopen(url)                 #pageFile ---a file-liked object
        soup = BeautifulSoup(pageFile)           #get a BeatifulSoup Object
        #print soup.prettify()                    #
        elements = urlparse.urlparse(url)        #parse url into a 6-tuple
        print elements
        parsed = list(elements)                  #new list initialized from iterable items
        for image in soup.findAll("img"):        #find all "img"tag
            #print "Image: %(src)s" % image
            print image,image['src'],type(image)
            image_url = urlparse.urljoin(url, image['src'])  #construct a full url
            filename = image["src"].split("/")[-1]           
            outpath = os.path.join(out_folder, filename)     #
            #print out_folder,filename,outpath
            urlretrieve(image_url, outpath)                  #download pictrues
    
    if __name__ == "__main__":
        url = config.url
        folder = config.folder
        if os.path.exists(folder):
            print 'ok'
            main(url,folder)
        else:
            os.makedirs(folder)
            main(url,folder)
  • 相关阅读:
    【Mybatis源码解析】Mybatis的日志系统
    20200728
    【Mybatis源码解析】-Configuration
    【日志】怎么打印日志
    【OOM】几种常见的OOM异常
    树 [虚树, 动态规划]
    最大公约数 [动态规划]
    送分题 [组合计数]
    LCM [树状数组, HH的项链]
    AT1219 歴史の研究 [回滚莫队]
  • 原文地址:https://www.cnblogs.com/sdu20112013/p/3847527.html
Copyright © 2011-2022 走看看