zoukankan      html  css  js  c++  java
  • 03、书店寻宝(二)

     
        题目要求:你需要爬取的是网上书店Books to ScrapeTravel这类书中,所有书的书名、评分、价格三种信息,并且打印提取到的信息。
     
     
     1 #3、书店寻宝(二)
     2 #    题目要求:你需要爬取的是网上书店Books to ScrapeTravel这类书中,所有书的书名、评分、价格三种信息,并且打印提取到的信息。
     3 #    网页URL:http://books.toscrape.com/catalogue/category/books/travel_2/index.html
     4 
     5 import requests
     6 from bs4 import BeautifulSoup
     7 res = requests.get('http://books.toscrape.com/catalogue/category/books/travel_2/index.html')
     8 html = res.text
     9 soup = BeautifulSoup(html,'html.parser')
    10 items = soup.find_all('article',class_='product_pod')
    11 for item in items:
    12     print(item.find('h3').find('a')['title']+'	'+item.find('p')['class'][1],'	',item.find('p',class_='price_color').text)
    13 #    print(item.find('h3').find('a')['title'])
    14 #    print(item.find('p')['class'][1])
    15 #    print(item.find('p',class_='price_color').text)
    16 
    17 
    18 '''
    19 执行结果如下:
    20 It's Only the Himalayas Two      £45.17
    21 Full Moon over Noahâs Ark: An Odyssey to Mount Ararat and Beyond    Four     £49.43
    22 See America: A Celebration of Our National Parks & Treasured Sites      Three    £48.87
    23 Vagabonding: An Uncommon Guide to the Art of Long-Term World Travel     Two      £36.94
    24 Under the Tuscan Sun    Three    £37.33
    25 A Summer In Europe      Two      £44.34
    26 The Great Railway Bazaar        One      £30.54
    27 A Year in Provence (Provence #1)        Four     £56.88
    28 The Road to Little Dribbling: Adventures of an American in Britain (Notes From a Small Island #2)   One          £23.21
    29 Neither Here nor There: Travels in Europe       Three    £38.95
    30 1,000 Places to See Before You Die      Five     £26.08
    31 '''
    32 
    33 '''
    34 老师的代码
    35 
    36 import requests
    37 from bs4 import BeautifulSoup
    38 
    39 res_bookstore = requests.get('http://books.toscrape.com/catalogue/category/books/travel_2/index.html')
    40 bs_bookstore = BeautifulSoup(res_bookstore.text,'html.parser')
    41 list_books = bs_bookstore.find_all(class_='product_pod')
    42 for tag_books in list_books:
    43 # 找到a标签需要提取两次
    44     tag_name = tag_books.find('h3').find('a')
    45 # 这个p标签的class属性有两种:"star-rating",以及具体的几星比如"Two"。我们选择所有书都有的class属性:"star-rating"
    46     list_star = tag_books.find('p',class_="star-rating")
    47 # 价格比较好找,根据属性提取,或者标签与属性一起都可以
    48     tag_price = tag_books.find('p',class_="price_color")
    49 # 这里用到了tag['属性名']提取属性值
    50     print(tag_name['title'])
    51 # 同样是用属性名提取属性值
    52     print('star-rating:',list_star['class'][1])
    53 # 用list_star['class']提取出来之后是一个由两个值组成的列表,如:"['star-rating', 'Two']",我们最终要提取的是这个列表的第1个值:"Two"。
    54 # 为什么是列表呢?因为这里的class属性有两个值。其实,在这个过程中,我们是使用class属性的第一个值提取出了第二个值。
    55 # 打印的时候,我加上了换行,为了让数据更加清晰地分隔开,当然你也可以不加。</code></pre>
    56     print('Price:',tag_price.text, end='
    '+'------'+'
    ')
    57 '''
    items中每个Tag的内容如下
     
     1 <article class="product_pod">
     2     <div class="image_container">
     3         <a href="../../../its-only-the-himalayas_981/index.html"><img alt="It's Only the Himalayas" class="thumbnail"
     4                 src="../../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg" /></a>
     5     </div>
     6     <p class="star-rating Two">
     7         <i class="icon-star"></i>
     8         <i class="icon-star"></i>
     9         <i class="icon-star"></i>
    10         <i class="icon-star"></i>
    11         <i class="icon-star"></i>
    12     </p>
    13     <h3><a href="../../../its-only-the-himalayas_981/index.html" title="It's Only the Himalayas">It's Only the
    14             Himalayas</a></h3>
    15     <div class="product_price">
    16         <p class="price_color">£45.17</p>
    17         <p class="instock availability">
    18             <i class="icon-ok"></i>
    19 
    20 
    21             In stock
    22 
    23 
    24         </p>
    25         <form>
    26             <button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
    27         </form>
    28     </div>
    29 </article>
  • 相关阅读:
    TCP软件环境测试
    MTK6261之检测是否插了T卡
    java实现MD5加密
    Lrucache缓存技术
    Android自定义UI模板
    《富爸爸,穷爸爸》读后感——怎么实现财务自由
    JAVA双向链表
    写一个查找算法找出数组中相同的元素
    二分查找算法(JAVA)
    Android查询系统的音频(音乐播放器的核心)
  • 原文地址:https://www.cnblogs.com/www1707/p/10692316.html
Copyright © 2011-2022 走看看