zoukankan      html  css  js  c++  java
  • 【pyhon】nvshens图片批量下载爬虫

    代码:

    # nvshens图片批量下载爬虫
    from bs4 import BeautifulSoup
    import requests
    import time
    import urllib.request
    
    user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)'
    headers={'User-Agent':user_agent}
    
    # 找到的图片
    pictures=[]
    
    # 不断追逐,直到结束
    def crawl(url):
        print("爬取页面"+url);
    
        try:
            rsp=requests.get(url,headers=headers)
            soup= BeautifulSoup(rsp.text,'html.parser',from_encoding='utf-8')
            nextUrl="none";
    
            for divs in soup.find_all(class_="gallery_wrapper"):
                # 把找到的图片放到数组里去
                for img in divs.find_all('img'):
                    print(img.get("src"))
                    pictures.append(img.get("src"))
    
                #找下一页
                for link in divs.find_all('a',class_='a1'):
                    if link.string=='下一页' and link.get("href").find('.html')!=-1:
                        nextUrl='https://www.nvshens.com'+link.get("href");
    
            if nextUrl!="none":
                print("前往下一页");
                crawl(nextUrl)
            else:
                print('爬取结束,开始下载...')
                downloadPics()
        except Exception as e:
            print("发生异常。重新爬行")# 不管怎么出现的异常,就让它一直爬到底
            crawl(nextUrl)
    
    # 下载图片到本地
    def downloadPics():
        for pic in pictures:
            name=pic.split('/')[-1]
    
            rsp=urllib.request.urlopen(pic)
            img=rsp.read()
            with open(name,'wb') as f:
                f.write(img)
            print('图片'+pic+'下载完成')
    
    # Kickoff
    crawl('https://www.nvshens.com/g/22210/')

    输出示例:

    C:Usershorn1Desktoppython7>python downloadall.py
    爬取页面https://www.nvshens.com/g/22210/
    C:Usershorn1AppDataLocalProgramsPythonPython36libsite-packagess4__init__.py:146: UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.
      warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.")
    https://img.onvshen.com:85/gallery/23789/22210/s/0.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/001.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/002.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/003.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/004.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/2.html
    https://img.onvshen.com:85/gallery/23789/22210/s/005.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/006.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/007.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/008.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/009.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/3.html
    https://img.onvshen.com:85/gallery/23789/22210/s/010.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/011.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/012.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/013.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/014.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/4.html
    https://img.onvshen.com:85/gallery/23789/22210/s/015.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/016.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/017.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/018.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/019.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/5.html
    https://img.onvshen.com:85/gallery/23789/22210/s/020.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/021.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/022.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/023.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/024.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/6.html
    https://img.onvshen.com:85/gallery/23789/22210/s/025.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/026.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/027.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/028.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/029.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/7.html
    https://img.onvshen.com:85/gallery/23789/22210/s/030.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/031.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/032.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/033.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/034.jpg
    前往下一页
    爬取页面https://www.nvshens.com/g/22210/8.html
    https://img.onvshen.com:85/gallery/23789/22210/s/035.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/036.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/037.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/038.jpg
    https://img.onvshen.com:85/gallery/23789/22210/s/039.jpg
    爬取结束,开始下载...
    图片https://img.onvshen.com:85/gallery/23789/22210/s/0.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/001.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/002.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/003.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/004.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/005.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/006.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/007.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/008.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/009.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/010.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/011.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/012.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/013.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/014.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/015.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/016.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/017.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/018.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/019.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/020.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/021.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/022.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/023.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/024.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/025.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/026.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/027.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/028.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/029.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/030.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/031.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/032.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/033.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/034.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/035.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/036.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/037.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/038.jpg下载完成
    图片https://img.onvshen.com:85/gallery/23789/22210/s/039.jpg下载完成

    感觉Python爬虫是比Nodejs爬虫省事一些。

  • 相关阅读:
    C++ 使用老牌库xzip & unzip对文件进行压缩解压
    第一次玩蛇,有点紧张。
    fiddler 抓取手机http/https包
    disk或者Partition镜像的制作
    VS2013+phread.h环境配置
    C++ 浅谈 strlen 与 sizeof的区别
    Qt 显示网页的控件
    Qt error: C2236: 意外的标记“class”。是否忘记了“;”?
    初识MySQL——人生若如初相逢
    【学习笔记】HTML基础:列表、表格与媒体元素
  • 原文地址:https://www.cnblogs.com/heyang78/p/8670860.html
Copyright © 2011-2022 走看看