zoukankan      html  css  js  c++  java
  • Python下载网页图片

    #coding:utf-8
    import requests
    from bs4 import BeautifulSoup
    import re
    DownPath = "/jiaoben/python/meizitu/pic/"
    import urllib
    head = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
    TimeOut = 5
    PhotoName = 0
    c = '.jpeg'
    PWD="/jiaoben/python/meizitu/pic/"
    for x in range(1,4):
      site = "http://www.meizitu.com/a/qingchun_3_%d.html" %x
      Page = requests.session().get(site,headers=head,timeout=TimeOut)
      Coding =  (Page.encoding)
      Content = Page.content#.decode(Coding).encode('utf-8')
      ContentSoup = BeautifulSoup(Content)
      jpg = ContentSoup.find_all('img',{'class':'scrollLoading'})
      for photo in jpg:
        PhotoAdd = photo.get('data-original')
        PhotoName +=1
        Name =  (str(PhotoName)+c)
        r = requests.get(PhotoAdd,stream=True)
        with open(PWD+Name, 'wb') as fd:
            for chunk in r.iter_content():
                    fd.write(chunk)
    print ("You have down %d photos" %PhotoName)

    # -*- coding:utf-8 -*-  
    import urllib.request  
    path = "D:\Download"  
    url = "http://pic2.sc.chinaz.com/files/pic/pic9/201309/apic520.jpg"  
    name ="D:\download\1.jpg"  
    #保存文件时候注意类型要匹配,如要保存的图片为jpg,则打开的文件的名称必须是jpg格式,否则会产生无效图片  
    conn = urllib.request.urlopen(url)  
    f = open(name,'wb')  
    f.write(conn.read())  
    f.close()  
    print('Pic Saved!')   

    很简单,打开个url链接,然后save到某个文件夹下就可以了。

    有时候不如不想输入路径,那就需要用os模块来修改当前路径 

    os.chdir("D:\download")  
    os.getcwd()  

    这样保存的文件就只需要名字就可以了

    f = open('1.jpg','wb')  
    

     这上面的url是给定的,只能下载一张图片,如果要是批量下载,就需要用循环来判断不同的url,

     下面是从其他地方看到的一个例子,就是把图片url中的图片名字修改,然后就可以循环保存了,不过也是先确定了某个url

    来源:http://www.oschina.net/code/snippet_1016509_21961 开源中国社区,自己修改的地方是提出了相同代码def了个函数 

    import os  
    import urllib.request  
    def rename(name):  
        if len(name) == 2:  
            name = '0' + name + '.jpg'  
        elif len(name) == 1:  
            name = '00' + name + '.jpg'  
        else:  
            name = name + '.jpg'  
        return name  
         
    os.chdir("D:\download")  
    os.getcwd()  
    count = 1  
    name=str(count)  
    name = rename(name)  
    print(name)  
    url = 'http://bgimg1.meimei22.com/list/2012-5-24/2/sa' + name  
    while count < 15:  
        a = urllib.request.urlopen(url)  
        f = open(name, "wb")  
        f.write(a.read())  
        f.close()  
        print(url + ' Saved!')     
        count = count + 1  
        name=str(count)  
        name = rename(name)  
        print(name)  
        url = 'http://bgimg1.meimei22.com/list/2012-5-24/2/sa' + name  
        try:  
            a = urllib.request.urlopen(url)  
            pass  
        except (Exception) as e:  
            print(e)   
        else:  
            pass  
    else:  
        print(url + ' not found')  
    

     当然也可以自己建立http连接,然后动态获取.jpg的图片  

    url = "desk.zol.com.cn"  
    conn = http.client.HTTPConnection(url)  
    conn.request("GET", "/dongman/")  
    r = conn.getresponse()  
    print (r.status, r.reason)  
    data1 = r.read()#.decode('utf-8') #编码根据实际情况酌情处理

     开始时候写的老是提示目标计算机主动拒绝, 后来才发现我选的函数是HTTPSConnection() ,当然会被拒绝了,这一点应该注意,要选择HTTPConnection()

  • 相关阅读:
    接口测试的维度
    python每个文件都需要顶部注释,那今天介绍一个方法,只需要设置一次,下次新建python文件后,注释自动出现在顶部的方法
    反射的4个方法
    python3 导入包总提示no moudle named xxx
    http.client.ResponseNotReady: Request-sent
    jmeter Linux环境执行总报错 cannot allocate memory
    xpath提取包含标签的所有文本内容
    使用python批量插入wordpress-从理清表结构开始
    Cloudways托管迁移中遇到的问题
    wordpress后台数据库表分析
  • 原文地址:https://www.cnblogs.com/apexchu/p/5014616.html
Copyright © 2011-2022 走看看