zoukankan html css js c++ java

淘宝商品信息定向爬虫实例介绍

功能描述:
1)目标: 获取淘宝搜索页面的信息, 提取其中的商品名称和价格.
2)理解: 淘宝的搜索接口, 翻页的处理
3)技术路线  requests-re


import re

"""
1, 提交商品搜索请求, 循环获取页面
2, 对于每个页面, 提取商品名称和价格信息
3, 将信息输出到屏幕上
"""


def getHtmlText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ''


def parsePage(ilt, html):
    try:
        plt = re.findall(r'"view_price":"[d.]*"', html)
        tlt = re.findall(r'"raw_title":".*?"', html)  # *?为最小匹配
        for i in range(len(plt)):
            price = eval(plt[i].split(':')[1])
            title = eval(tlt[i].split(':')[1])
            ilt.append([price, title])
    except:
        print("")


def printGoodList(ilt):
    tplt = "{:4}	{:8}	{:16}"
    print(tplt.format('序号', '价格', '商品名称'))
    count = 0
    for g in ilt:
        count = count + 1
        print(tplt.format(count, g[0], g[1]))


def main():
    goods = '书包'
    depth = 2
    start_url = 'https://s.taobao.com/search?q=' + goods
    info_list = []
    for i in range(depth):
        try:
            url = start_url + '&s=' + str(44 * i)
            html = getHtmlText(url)
            parsePage(info_list, html)
        except:
            continue
    printGoodList(info_list)


main()

查看全文

相关阅读:
Spring Boot 环境变量读取和属性对象的绑定
 SpringMvc（4-1）Spring MVC 中的 forward 和 redirect（转）
shiro实现登录安全认证（转）
史上最全的开发工具类（转）
Shiro权限管理框架详解
 js中退出语句break,continue和return 比较（转）
jQuery获取多种input值的方法（转）
jquery常用方法总结（转）
jQuery常用方法(持续更新)（转）
idea+springboot+freemarker热部署（转）

原文地址：https://www.cnblogs.com/wangyue0925/p/11231898.html