zoukankan      html  css  js  c++  java
  • xpath解析 4k 美女图片爬取

    import requests
    import os
    
    from lxml import etree
    url = 'http://pic.netbian.com/4kmeinv/'
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0'
    }
    response = requests.get(url=url,headers=headers)
    # 手动设定响应数据的编码格式
    # response.encoding='utf-8'
    page_text= response.text
    tree = etree.HTML(page_text)
    li_list = tree.xpath('//div[@class="slist"]/ul/li')
    # 创建一个文件夹
    if not os.path.exists('./imgLibs'):
        os.mkdir('./imgLibs')
    for li in li_list:
        img_src = li.xpath('./a/img/@src')[0]
    
        img_name = li.xpath('./a/img/@alt')[0] + '.jpg'
        # 通用处理乱码的解决方案
        img_name = img_name.encode('iso-8859-1').decode('gbk')
    
        img_url = 'http://pic.netbian.com' +img_src
        img_path = './imgLibs/' + img_name
        img_data = requests.get(url=img_url,headers=headers).content
        print(img_name,img_url)
        with open(img_path,'wb') as fp:
            fp.write(img_data)
            print(img_name,'下载成功!!')
    
    人生苦短,我用python
  • 相关阅读:
    10.17T1 联通块
    10.16复习 数位DP——不要62
    10.16T6 逆序对变式
    10.16T5 最小环+拆点最短路
    10.16T4 GCD递归
    10.16T2 平方差
    10.16T3 乱搞+最优性剪枝
    10.16T1 二分+单调队列优化DP
    10.15T3 树形DP
    10.15T2 生成树+非树边暴力
  • 原文地址:https://www.cnblogs.com/niucunguo/p/14410350.html
Copyright © 2011-2022 走看看