zoukankan html css js c++ java

python+scrapy爬取知乎日报全站文章

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class ZhihudailyItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    date=scrapy.Field()
    title=scrapy.Field()
    url=scrapy.Field()
    content=scrapy.Field()

#!/usr/bin/python
#coding:utf-8
import scrapy

class ZhihudailySpider(scrapy.spider.Spider):
    name='zhihudaily'
    allowd_domains=['zhihu.com']
    start_urls=[
        "http://zhihudaily.ahorn.me/page/1"]
    def parse(self,response):
        for sel in response.xpath("//div[@class='post']"):
            for sub in sel.xpath("./div/div"):
                url=sub.xpath("./a/@href").extract()[0]
                yield scrapy.Request(url,callback=self.parse_url)                

        for page in range(2,500):
            request=scrapy.Request("http://zhihudaily.ahorn.me/page/"+str(page),callback=self.parse)
            yield request

    def parse_url(self,response):
        title=response.xpath("//h1[@class='headline-title']/text()").extract()[0]
        print "标题:",title
        print "*************************************************************************"
        for p in response.xpath("//div[@class='content']/p/text()").extract():
            print p

查看全文

相关阅读:
Bootstrap-datepicker3官方文档中文翻译---Methods/方法（原文链接 http://bootstrap-datepicker.readthedocs.io/en/latest/index.html）
Bootstrap-datepicker3官方文档中文翻译---Options/选项（原文链接 http://bootstrap-datepicker.readthedocs.io/en/latest/index.html）
IOI2020集训队作业
 校内集训作业题
 CF题目泛做 3
CF题目泛做 2
CF题目泛做1
NOIP2020
相邻交换法 & 皇后游戏
 Codeforces Round #679 Div.1

原文地址：https://www.cnblogs.com/tmyyss/p/4551974.html