zoukankan      html  css  js  c++  java
  • 【Python爬虫实战】为何如此痴迷Python?还不是因为爱看小姐姐图

    爬取目标

    网址:绝对领域

    在这里插入图片描述

    工具使用

    开发环境:win10、python3.7
    开发工具:pycharm、Chrome
    工具包:requests,lxml

    项目思路解析

    选取你对应的图片分类

    根据分类信息提取到没有图片的超链接,提取出A标签的跳转地址以及图片的标题名字

    def get_url(start_url):
        response = requests.get(start_url, headers=headers).text
        data = etree.HTML(response)
        new_url = data.xpath('//div[@class="post-module-thumb"]/a/@href')
        for url in new_url:
            yield url
    复制代码

    进入详情页面,xpath提取详情页面所有的图片地址:

    发送图片数据请求,保存对应图片数据信息,就可以啦是不是超级简单嘿嘿(*╹▽╹*)

    简易源码分享

    import requests
    from lxml import etree
    
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
    }
    
    
    def get_url(start_url):
        response = requests.get(start_url, headers=headers).text
        data = etree.HTML(response)
        new_url = data.xpath('//div[@class="post-module-thumb"]/a/@href')
        for url in new_url:
            yield url
    
    
    def get_img(url):
        response = requests.get(url, headers=headers).text
        img_data = etree.HTML(response)
        img_url = img_data.xpath('//div[@class="entry-content"]/img/@src')
        for img_url in img_url:
            name = img_url.split("/")[-2] + img_url.split("/")[-1]
            result = requests.get(img_url).content
            with open("图片/" + name, "wb")as f:
                f.write(result)
                print("正在下载", name)
    
    
    if __name__ == '__main__':
        for i in range(1, 3):
            start_url = "https://www.jdlingyu.com/tuji/hentai/gctt/page/{}".format(i)
            html_url = get_url(start_url)
            for url in html_url:
                get_img(url)

    我是南鹤-,一名喜欢分享知识的程序媛❤️

    如果没有接触过编程这块的朋友看到这篇博客,发现不会弄,可以直接留言【非常感谢你的点赞、收藏、关注、评论,一键四连支持】



    每日分享,喜欢的看标题和多多点赞收藏加关注~~蟹蟹
  • 相关阅读:
    Mvc+三层(批量添加、删除、修改)
    js中判断复选款是否选中
    EF的优缺点
    Git tricks: Unstaging files
    Using Git Submodules
    English Learning
    wix xslt for adding node
    The breakpoint will not currently be hit. No symbols have been loaded for this document."
    Use XSLT in wix
    mfc110ud.dll not found
  • 原文地址:https://www.cnblogs.com/nanhe/p/15070249.html
Copyright © 2011-2022 走看看