zoukankan html css js c++ java

慕课中爬取淘宝商品信息

 1 import requests
 2 import re
 3 
 4 def getTHMLText(url):
 5     try:
 6         r = requests.get(url, timeout=30)
 7         r.raise_for_status()
 8         r.encoding = r.apparent_encoding
 9         return r.text
10     except:
11         return " "
12 
13 def parsePage(ilt, html):
14     try:
15         plt = re.findall(r'"view_price":"[d.]*"',html)
16         tlt = re.findall(r'"raw_title":".*?"', html)
17         for i in range(len(plt)):
18             price = eval(plt[i].split(":")[1])
19             title = eval(tlt[i].split(":")[1])
20             ilt.append([price, title])
21     except:
22         print(" ")
23 
24 def printGodeList(ilt):
25     tplt = "{:4}	{:8}	{:16}"
26     print(tplt.format("序号", "价格", "商品名称"))
27     count = 0
28     for g in ilt:
29         count = count + 1
30         print(tplt.format(count, g[0], g[1]))
31 
32 def main():
33     goods = "书包"
34     depth = 3
35     start_url = "https://s.taobao.com/search?q=" + goods
36     infoList = []
37     for i in range(depth):
38         try:
39             url = start_url + "&s==" + str(44*i)
40             html = getTHMLText(url)
41             parsePage(infoList, html)
42         except:
43             continue
44     printGodeList(infoList)
45 
46 main()

这个爬取采用了，requests-re路线实现了淘宝商品的比价定向爬取，并没有采用requests-BeautifulSoup的方法来实现。用正则表达的方式来提取信息，比用bs4库的方法更加简单。重难点也是正则表达式的应用。

我们分析价格的使用键值对表示的，所以我们应该找“view_price",来寻找价格。

分析商品的名称，是用键值对表示的。所以应该用"raw_title"来寻找商品的名称。

查看全文

相关阅读:
webpack采坑十连跳
 白板编程
 Mysql加锁过程详解（1）-基本知识
 java单例模式几种实现方式
 mysql 幻读的详解、实例及解决办法
 MySQL执行计划extra中的using index 和 using where using index 的区别
 mysql INSERT ... ON DUPLICATE KEY UPDATE语句
 tk mybatis通用mapper，复杂and or条件查询
 tk.mybaits
docker 笔记

原文地址：https://www.cnblogs.com/tianqianlan/p/9446578.html