zoukankan      html  css  js  c++  java
  • 爬取汽车之家

    这个算是爬虫的老梗了.

    就是用requests 和 beautifulsoup来操作一波.

     1 import requests
     2 from bs4 import BeautifulSoup
     3 
     4 ret = requests.get(url="https://www.autohome.com.cn/news/")
     5 code = ret.apparent_encoding
     6 ret.encoding = ret.apparent_encoding
     7 # print(ret.text)
     8 
     9 soup = BeautifulSoup(ret.text, 'html.parser')
    10 # print(soup)
    11 
    12 div = soup.find(name='div', id='auto-channel-lazyload-article')
    13 # print(div)
    14 li_list = div.find_all(name='li')
    15 # print(li_list)
    16 for it in li_list:
    17     h3 = it.find(name='h3')
    18     if not h3:
    19         continue
    20     # print(h3.text)
    21     p = it.find(name='p')
    22     a = it.find(name='a')
    23     img = it.find(name='img')
    24     src = img.get('src')
    25 
    26     file_name = './image/' + src.rsplit('__', maxsplit=1)[1]
    27 
    28     ret_img = requests.get(
    29         url='https:' + src
    30     )
    31 
    32     with open(file_name, 'wb') as fw:
    33         fw.write(ret_img.content)
    34 
    35     print(h3.text, a.get('href'))
    36     print(p.text)
    37     print('=' * 15)
  • 相关阅读:
    CMD命令
    python函数
    steam更新出错 应用运行中
    更改steam的游戏库
    python 3.6 setup
    vim 安装
    绝地求生大逃杀,改配置
    回写盘写速度被限速为10M左右
    JAVA和C# 3DES加密解密
    DES/3DES/AES区别
  • 原文地址:https://www.cnblogs.com/zllwxm123/p/10180060.html
Copyright © 2011-2022 走看看