zoukankan      html  css  js  c++  java
  • [原创]python爬虫之BeautifulSoup,爬取网页上所有图片标题并存储到本地文件

    from bs4 import BeautifulSoup
    import requests
    import re
    import os
    r = requests.get("https://re.jd.com/search?keyword=%E6%B0%B4%E6%9E%9C%20%E7%BD%91&keywordid=44195495794&re_dcp=202m0QjIIg==&traffic_source=1004&test=1&enc=utf8&cu=true&utm_source=baidu-search&utm_medium=cpc&utm_campaign=t_262767352_baidusearch&utm_term=44195495794_0_32d58cbc7f0f40e08d64a09fbc8c95c4")
    result = r.content
    # print(result)
    soup = BeautifulSoup(result,"html.parser")
    # print(soup.script.text)
    souptext = soup.find(type='text/javascript').text
    # print(souptext)
    
    pattern3 =re.compile(r'"ad_title_text":"(.*?"),"image_url":"(.*?.(jpg|png))"')
    patternresult3 = pattern3.findall(souptext)
    print(patternresult3)
    
    j = 0
    for i in patternresult3:
        j = j+1
        title = i[0].replace(' ','').replace('"','1').replace('/','1')
        with open(os.getcwd()+'\jpg\'+title+str(j)+"."+i[2],"wb") as f: #在执行代码前,需要先创建一个jpg的目录,os.getcwd()用来获取当前目录
            f.write(requests.get("https://img1.360buyimg.com/n6/"+i[1]).content)

    运行结果如下:

  • 相关阅读:
    在ASP.NET MVC中使用DropDownList引用。呵呵。
    获取泛型对象
    Tomcat JVM 初始化加大内存
    Tomcat6.0 连接池的配置
    @ResponseBody与Ajax
    c3po数据库连接池中取出连接
    SpringMVC
    JQuery发送Ajax请求
    Java生成验证码
    Spring初学
  • 原文地址:https://www.cnblogs.com/lelexiong/p/10869451.html
Copyright © 2011-2022 走看看