zoukankan      html  css  js  c++  java
  • python scrapy解码方法和时间格式转换

    import scrapy
    from datetime import datetime
    
    class BianSpider(scrapy.Spider):
        name = 'bian'
        # allowed_domains = ['www']
        start_urls = ['http://tech.163.com/special/00097UHL/tech_datalist.js?callback=data_callback']
    
        def parse(self, response):
            # print(response.body.decode('gbk'))
            import json
        ---《for i in json.loads(response.body.decode('gbk').strip('data_callback(').strip(')')):》---
                print(i['title'])
                print(i['label'])
                time_list = i['time']
           ---《print(datetime.strptime(time_list,'%m/%d/%Y %H:%M:%S'))》---
                print(','.join([ii['keyname'] for ii in i['keywords']]))
                desc_href = i['docurl']
                yield scrapy.Request(desc_href,self.show)
    
        def show(self,response):
            # print(response.xpath('//div[3]/div[2]/div[1]/div[1]//text()'))
            types = response.xpath("string(//div[@class='post_crumb'])").extract_first().strip()
            weizhi = ' '.join(response.xpath("//div[@class='post_crumb']//text()").extract()).strip()
    
            print(weizhi)
            print(response.xpath('//*[@id="ne_article_source"]/text()').extract())
            print(response.xpath('//*[@id="endText"]/div[2]/span[2]/text()').extract())
  • 相关阅读:
    列表、元组、字典练习
    周总结04
    站立会议07
    人月神话阅读笔记01
    站立会议06
    站立会议05
    站立会议04
    典型用户需求分析第一期
    站立会议03
    站立会议02
  • 原文地址:https://www.cnblogs.com/duanlinxiao/p/9847701.html
Copyright © 2011-2022 走看看