zoukankan      html  css  js  c++  java
  • 中国大学MOOC —— 学习笔记(四)

    淘宝商品比价定向爬虫

    目标:获取淘宝搜索页面信息,提取其中的商品名称和价格

    程序的结构设计:

    1. 提交商品搜索请求,循环获取页面
    2. 对每个页面,提取商品名称和价格信息
    3. 将信息输出到屏幕上
    import requests
    import re
    
    def getHTMLText(url):
         try:
              r = requests.get(url)
              r.raise_for_status()
              r.encoding = r.apparent_encoding
              return r.text
         except:
              return ""
    def parsePage(ilt,html):
         try:
              plt = re.findall(r'"view_price":"[d.]*"',html)
              tlt = re.findall(r'"raw_title":".*?"',html)
              for i in range(len(plt)):
                   price = eval(plt[i].split(':')[1])
                   title = eval(tlt[i].split(':')[1])
                   ilt.append([price,title])
         except:
              print("")
    
    def printGoodList(ilt):
         tplt = "{:4}	{:8}	{:16}"
         print(tplt.format("序号","价格","商品名"))
         count = 0
         for g in ilt:
              count = count +1
              print(tplt.format(count,g[0],g[1]))
    
    def main():
         goods = '书包'
         depth = 2
         start_url = 'https://s.taobao.com/search?q=' + goods
         infoList = []
         for i in range(depth):
              try:
                   url = start_url + '&s=' + str(44*i)
                   html = getHTMLText(url)
                   parsePage(infoList,html)
              except:
                   continue
         printGoodList(infoList)
    main()
  • 相关阅读:
    微信支付 h5
    微信支付 h5
    Android stadio butternife工具
    Android stadio butternife工具
    Android stadio 自定义debug release keystore
    Android stadio 自定义debug release keystore
    Android 微信支付步骤
    Android 微信支付步骤
    t
    t
  • 原文地址:https://www.cnblogs.com/BeautifulSoup/p/8455143.html
Copyright © 2011-2022 走看看