zoukankan      html  css  js  c++  java
  • 2017.08.04 Python网络爬虫之Scrapy爬虫实战二 天气预报

    1.项目准备:网站地址:http://quanzhou.tianqi.com/

    2.创建编辑Scrapy爬虫:

    scrapy startproject weather

    scrapy genspider HQUSpider quanzhou.tianqi.com

    项目文件结构如图:

     3.修改Items.py:

    4.修改Spider文件HQUSpider.py:

    (1)先使用命令:scrapy shell http://quanzhou.tianqi.com/   测试和获取选择器:

    (2)试验选择器:打开chrome浏览器,查看网页源代码:

    (3)执行命令查看response结果:

    (4)编写HQUSpider.py文件:

    # -*- coding: utf-8 -*-
    import scrapy
    from weather.items import WeatherItem

    class HquspiderSpider(scrapy.Spider):
    name = 'HQUSpider'
    allowed_domains = ['tianqi.com']
    citys=['quanzhou','datong']
    start_urls = []
    for city in citys:
    start_urls.append('http://'+city+'.tianqi.com/')
    def parse(self, response):
    subSelector=response.xpath('//div[@class="tqshow1"]')
    items=[]
    for sub in subSelector:
    item=WeatherItem()
    cityDates=''
    for cityDate in sub.xpath('./h3//text()').extract():
    cityDates+=cityDate
    item['cityDate']=cityDates
    item['week']=sub.xpath('./p//text()').extract()[0]
    item['img']=sub.xpath('./ul/li[1]/img/@src').extract()[0]
    temps=''
    for temp in sub.xpath('./ul/li[2]//text()').extract():
    temps+=temp
    item['temperature']=temps
    item['weather']=sub.xpath('./ul/li[3]//text()').extract()[0]
    item['wind']=sub.xpath('./ul/li[4]//text()').extract()[0]
    items.append(item)
    return items


    (5)修改pipelines.py我,处理Spider的结果:
    # -*- coding: utf-8 -*-

    # Define your item pipelines here
    #
    # Don't forget to add your pipeline to the ITEM_PIPELINES setting
    # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
    import time
    import os.path
    import urllib2
    import sys
    reload(sys)
    sys.setdefaultencoding('utf8')

    class WeatherPipeline(object):
    def process_item(self, item, spider):
    today=time.strftime('%Y%m%d',time.localtime())
    fileName=today+'.txt'
    with open(fileName,'a') as fp:
    fp.write(item['cityDate'].encode('utf-8')+' ')
    fp.write(item['week'].encode('utf-8')+' ')
    imgName=os.path.basename(item['img'])
    fp.write(imgName+' ')
    if os.path.exists(imgName):
    pass
    else:
    with open(imgName,'wb') as fp:
    response=urllib2.urlopen(item['img'])
    fp.write(response.read())
    fp.write(item['temperature'].encode('utf-8')+' ')
    fp.write(item['weather'].encode('utf-8')+' ')
    fp.write(item['wind'].encode('utf-8')+' ')
    time.sleep(1)
    return item


    (6)修改settings.py文件,决定由哪个文件来处理获取的数据:

    (7)执行命令:scrapy crawl HQUSpider

    到此为止,一个完整的Scrapy爬虫就完成了。

















  • 相关阅读:
    纯CSS打造好看的按钮样式
    jQuery手机端触摸卡片切换效果
    CSS手动改变DIV高宽
    Windows 10 的音频和 MIDI API将统一
    美食网站响应式精美模板
    三道Javascript的练习题
    html5手机端遮罩弹出菜单代码
    CSS的::selection使用方法
    Html5绘制饼图统计图
    JQuery实现一个简单的鼠标跟随提示效果
  • 原文地址:https://www.cnblogs.com/hqutcy/p/7284302.html
Copyright © 2011-2022 走看看