zoukankan      html  css  js  c++  java
  • 二、抓取网络上的资源信息

     

    一、获取到网络上的网页

    from bs4 import  BeautifulSoup
    import requests
    
    url = 'https://www.tripadvisor.cn/Attractions-g60763-Activities-New_York_City_New_York.html'
    web_data = requests.get(url)
    soup = BeautifulSoup(web_data.text,'lxml')
    print(soup)

    二、获取想要的数据

    from bs4 import  BeautifulSoup
    import requests
    
    url = 'https://www.tripadvisor.cn/Attractions-g60763-Activities-New_York_City_New_York.html'
    web_data = requests.get(url)
    soup = BeautifulSoup(web_data.text,'lxml')
    titles = soup.select('#taplc_attraction_coverpage_attraction_0 > div:nth-of-type(1) > div > div > div.shelf_item_container > div:nth-of-type(1) > div.poi > div > div.item.name > a')
    print(titles)

    但不是所有的,使用下面的方式获取所有

    from bs4 import  BeautifulSoup
    import requests
    
    url = 'https://www.tripadvisor.cn/Attractions-g60763-Activities-New_York_City_New_York.html'
    web_data = requests.get(url)
    soup = BeautifulSoup(web_data.text,'lxml')
    titles = soup.select('div.item.name')
    imgs = soup.select('img[width="200"]')
    cates = soup.select('div.poi > div > div:nth-of-type(3)')
    #taplc_attraction_coverpage_attraction_0 > div:nth-child(1) > div > div > div.shelf_item_container > div:nth-child(4) > div.poi > div > div:nth-child(3)
    # print(titles,imgs,cates,sep='
    -----------
    ')
    #验证下
    # for title in titles:
    #     print(title.get_text())
    # for img in imgs:
    #     print(img.get('src'))
    # for cate in cates:
    #     print(cate.get_text())
    for title,img,cate in zip(titles,imgs,cates):
        data={
            'title':title.get_text(),
            'img':img.get('src'),
            'cate':list(cate.stripped_strings),
        }
        print(data)

    二、伪造登陆

     三、爬取多个网页

     四、应对js-爬取手机端

  • 相关阅读:
    JSP标签介绍
    JSP四大作用域属性范围
    JSP九大内置对象及四个作用域
    maven:Fatal error compiling: 无效的目标发行版: 1.8.0_45 -> [Help 1]
    浅谈Session与Cookie的区别与联系
    Servlet入门实践
    安卓常用布局基本属性
    安卓常用布局
    Android开发中Handler的经典总结
    三种方法写监听事件
  • 原文地址:https://www.cnblogs.com/Michael2397/p/7748049.html
Copyright © 2011-2022 走看看