zoukankan      html  css  js  c++  java
  • 16.ajax_case02

    # 抓取当当网书评
    # http://product.dangdang.com/25340451.html
    
    import json
    import requests
    from lxml import etree
    
    
    for i in range(1,5):
        # url = 'http://product.dangdang.com/index.php?r=comment/list&productId=25340451&pageIndex=1'
        url = 'http://product.dangdang.com/index.php?r=comment/list&productId=25340451&categoryPath=01.07.07.04.00.00&mainProductId=25340451&mediumId=0&pageIndex={}'.format(i)
    
        header = {
                    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
                    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
                }
    
        response = requests.get(url,headers=header,timeout=5)
        # print(response.text)
    
        result = json.loads(response.text)
    comment_html = result['data']['list']['html']
    tree = etree.HTML(comment_html)
    comments = tree.xpath('//div[@class="items_right"]')
    for item in comments: comment_time = item.xpath('./div[contains(@class,"starline")]/span[1]/text()')[0]
    comment_content
    = item.xpath('./div[contains(@class,"describe_detail")]/span[1]//text()')[0] print(comment_time) print(comment_content)
  • 相关阅读:
    urlrewrite地址重写的使用
    算法学习
    数据库之Case When
    速卖通返回503错误
    概述:软件开发工具
    c#将List<T>转换成DataSet
    表单域规范写法
    ant打包和jar包混淆
    Node.js文档和教程
    webpack开发和生产两个环境的配置详解
  • 原文地址:https://www.cnblogs.com/hankleo/p/10646541.html
Copyright © 2011-2022 走看看