zoukankan      html  css  js  c++  java
  • 天猫淘宝评论数据抓取

    import requests
    import re,json
    import pandas
    
    
    class base():
        def __init__(self,url):
            self.url = url
    
        def all_url(self):
            return [self.url + "%s" % i for i in range(1,100)]
    
        def loads_jsonp(self,_jsonp):
            try:
                return json.loads(re.match(".*?({.*}).*",_jsonp,re.S).group(1))
            except:
                raise ValueError('Invalid Input')
    
        def url_req(self,url):
            content = requests.get(url).text
            aa = self.loads_jsonp(content)
            return aa
    
        def taobao_comment(self,data):
            for i in data['comments']:
                data = {}
                data['昵称']=i['user']['nick']
                data['评论']=i['content']
                info_list.append(data)
    
        def tianmao_comment(self,data):
            for i in data['rateList']:
                data = {}
                data['昵称']=i['displayUserNick']
                data['评论']=i['rateContent']
                info_list.append(data)
    
        def comment(self,url):
            data = self.url_req(url)
            self.tianmao_comment(data) if 'tmall' in url else self.taobao_comment(data)
                
    
    def main(url):
        data = base(url)
        for i in data.all_url():
            data.comment(i)
            print(len(info_list))
    
    
    if __name__ == "__main__":
        url = 'https://rate.tmall.com/list_detail_rate.htm?itemId=39258348512&spuId=250685252&sellerId=2106913388&order=3&currentPage='
        info_list = []
        main(url)
        df =pandas.DataFrame(info_list)
        df.to_excel('comments.xlsx',index=False)
  • 相关阅读:
    Neo4j电影关系图Cypher
    Neo4j电影关系图
    Neo4j配置文件neo4j.conf
    SpringBoot实现多数据源(实战源码)
    Maven添加Oracle驱动及依赖
    HttpClient发送Json数据到指定接口
    java手动分页处理
    设计模式之模板方法模式
    JDBC插入性能优化对比
    Oracle数据库常用监控语句
  • 原文地址:https://www.cnblogs.com/Erick-L/p/8000637.html
Copyright © 2011-2022 走看看