zoukankan      html  css  js  c++  java
  • python全栈(中国历史天气可视化系统)

    在前面博客中写过python爬虫爬取各个城市历史天气及数据可视化,感觉还是有点粗糙,本文章是借这个例子抛砖引玉,写出个更NB的python全栈项目。

    此项目预览页面全国各城市天气可视化系统

    scrapy抓取全国各城市的历史天气

    创建项目及准备工作

    创建项目

    scrapy startproject lishitianqi

    创建爬虫脚本文件(考虑到tianqihoubao.com访问速度实在是太慢,现在换用lishi.tianqi.com

    cd lishitianqi
    scrapy genspider weath lishi.tianqi.com

    测试该url是否可以直接爬取 

    scrapy crawl weath
    ...
    2019-09-09 13:22:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 http://lishi.tianqi.com/>: HTTP status code is not handled or not allowed
    ...

    在测试是否是user-agent导致

    In [15]: import requests
    
    In [16]: requests.get("https://lishi.tianqi.com/")
    Out[16]: <Response [403]>
    
    In [17]: requests.get("https://lishi.tianqi.com/",headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl
        ...: eWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'})
    Out[17]: <Response [200]>

    在setings.py中设置user-agent的值

    USER_AGENT='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'

    编写爬虫文件获取天气数据

    请求http://lishi.tianqi.com/后将返回全国所有城市页面,通过css选择器选择所有城市url,并添加到爬虫中

    def parse(self, response):
            #选择中国所有城市名及对应url
            citys = response.css(".bcity a[target='_blank']::text").getall()
            cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall()
    
            #将url添加到爬虫中,
            for city,url in zip(citys,cityurls):
                yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city)
    
        def city(self, response):
            pass

    请求http://lishi.tianqi.com/beijing/index.html,返回北京2011年1月-至今的所有月份页面,通过css选择器选择该城市所有月份url,并添加到爬虫中

    def city(self, response):
            #选择城市所有年月份及对应url
            months = response.css(".tqtongji1 a::text").getall()
            monthurls = response.css(".tqtongji1 a::attr(href)").getall()
    
            #将url添加到爬虫中
            for url in monthurls:
                yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month)
    
        def month(self, response):
            pass

    请求http://lishi.tianqi.com/beijing/201101.html,返回北京2011年1月天气页面,通过css选择器选择该月的每天的天气

        def month(self, response):
            #选择该月份中每天的天气数据
            weaths =  response.css(".tqtongji2 ul[class!='t1'] li::text").getall()
            #日期
            date = response.css(".tqtongji2 ul[class!='t1'] a::text").getall()
            #最大温度
            maxt = weaths[0::5]
            #最小温度
            mint = weaths[1::5]
            #天气
            weath = weaths[2::5]
            #风向
            windd = weaths[3::5]
            #风力
            windp = weaths[4::5]
    
            #打印该数据
            for date, maxt, mint, weath, windd, windp in zip(date, maxt, mint, weath, windd, windp):
                print(response.meta['city'],date, maxt, mint, weath, windd, windp)

    scrapy数据存储到MariaDB

    创建数据库和表

    MariaDB [(none)]> create database lishitianqi;
    Query OK, 1 row affected (0.209 sec)
    
    MariaDB [(none)]> use lishitianqi;
    Database changed
    MariaDB [lishitianqi]> CREATE TABLE `lishitianqi` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `city` varchar(16) NOT NULL,
      `date` date NOT NULL,
      `maxt` int(11) NOT NULL,
      `mint` int(11) NOT NULL,
      `weath` varchar(20) NOT NULL,
      `windd` varchar(20) NOT NULL,
      `windp` varchar(40) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `city` (`city`,`date`)
    ) ENGINE=InnoDB AUTO_INCREMENT=30673 DEFAULT CHARSET=utf8;
    Query OK, 0 rows affected (2.030 sec)

    编写模型文件items.py

    class LishitianqiItem(scrapy.Item):
        # define the fields for your item here like:
        city = scrapy.Field()
        date = scrapy.Field()
        maxt = scrapy.Field()
        mint = scrapy.Field()
        weath = scrapy.Field()
        windd = scrapy.Field()
        windp = scrapy.Field()

    修改爬虫文件weath.py

     1 # -*- coding: utf-8 -*-
     2 import scrapy
     3 from lishitianqi.items import LishitianqiItem
     4 
     5 class WeathSpider(scrapy.Spider):
     6     name = 'weath'
     7     allowed_domains = ['lishi.tianqi.com']
     8     start_urls = ['http://lishi.tianqi.com/']
     9 
    10     def parse(self, response):
    11         #选择中国所有城市名及对应url
    12         citys = response.css(".bcity a[target='_blank']::text").getall()
    13         cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall()
    14 
    15         #将url添加到爬虫中,
    16         for city,url in zip(citys[:1],cityurls[:1]):
    17             yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city)
    18 
    19     def city(self, response):
    20         #选择城市所有年月份及对应url
    21         months = response.css(".tqtongji1 a::text").getall()
    22         monthurls = response.css(".tqtongji1 a::attr(href)").getall()
    23 
    24         #将url添加到爬虫中
    25         for url in monthurls:
    26             yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month)
    27 
    28     def month(self, response):
    29         #选择该月份中每天
    30         for ul in response.css(".tqtongji2 ul[class!='t1']"):
    31             #创建item,将数据传递个pipeline
    32             item = LishitianqiItem()
    33             #获取城市名
    34             item['city'] = response.meta['city']
    35             
    36             a,b,c,d,e,f = ul.css('li')
    37             #获取日期
    38             item['date'] = a.css("::text").get()
    39             #获取日期
    40             item['maxt'] = b.css("::text").get()
    41             #获取日期
    42             item['mint'] = c.css("::text").get()
    43             #获取日期
    44             item['weath'] = d.css("::text").get()
    45             #获取日期
    46             item['windd'] = e.css("::text").get()
    47             #获取日期
    48             item['windp'] = f.css("::text").get()
    49 
    50             yield item
    View Code

    编写项目管道中间件pipeline.py

    import pymysql.cursors
    
    class LishitianqiPipeline(object):
    
        def __init__(self):
            #连接数据库
            self.connect = pymysql.connect(
                host='localhost',
                port=3306,
                db='lishitianqi',
                user='root',
                passwd='123',
                charset='utf8',
                use_unicode=True)
    
            #通过cursor执行增删该查
            self.cursor = self.connect.cursor()
    
        def process_item(self, item, spider):
    
            #插入天气数据
            self.cursor.execute(
                """insert into lishitianqi(city, date, maxt, mint, weath, windd, windp)
                values (%s, %s, %s, %s, %s, %s, %s)""",
                (item['city'],
                item['date'],
                item['maxt'],
                item['mint'],
                item['weath'],
                item['windd'],
                item['windp']))
    
            #提交sql语句
            self.connect.commit()
    
            #必须实现返回
            return item

    在settings.py中启用mysqlPipeline

    ITEM_PIPELINES = {
        'lishitianqi.pipelines.LishitianqiPipeline': 300,
        #格式为:'项目名.文件名.类名':优先级(越小越大)
    }

    爬虫性能优化

    首先看一下目标数据有多大,估计要多长时间能抓取完

    查看有多少个城市

    #scrapy shell https://lishi.tianqi.com/
    In [1]: citys = response.css(".bcity a[target='_blank']::text").getall()
    
    In [2]: len(citys)
    Out[2]: 3147

    将weath.py的16行代码修改为for city,url in zip(citys[:1],cityurls[:1]):,运行爬虫,验证爬取一个城市数据耗时

    2019-09-09 17:37:52 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
    ... 2019-09-09 17:41:54 [scrapy.core.engine] INFO: Spider closed (finished)

    一个城市耗时4分2s,共需要8.7天才能抓取到所有数据。在我前面写MySQL日志详解时,将二进制日志打开了,这个对性能影响比较大

    关掉二进制日志,在重启服务

    [mysqld]
    #log_bin=bin_log_file
    #binlog_format=row
    #sync_binlog=1
    #server_id=7  

    查看是否生效

    MariaDB [(none)]> show variables like "log_bin";
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | log_bin       | OFF   |
    +---------------+-------+
    1 row in set (0.001 sec)

    运行爬虫,验证关闭了二进制日志后抓取一个城市数据耗时

    2019-09-09 17:50:14 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
    ...
    2019-09-09 17:53:01 [scrapy.core.engine] INFO: Spider closed (finished)

    耗时2分47秒,这个数据还是有点大,验证不插入数据库纯爬取网页的耗时,将weath.py第50行代码注释掉

    			#yield item

    运行爬虫,验证不写入数据库,抓取一个城市数据耗时  

    2019-09-09 18:40:07 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
    2019-09-09 18:40:12 [scrapy.core.engine] INFO: Spider closed (finished)

    不写入数据库只需要5s,看来还是要在插入数据库效率上找原因。

    在上面的代码实现,是以天为单位调用插入接口,可以按月批量插入来提上效率

    修改后的weath.py

     1 # -*- coding: utf-8 -*-
     2 import scrapy
     3 from lishitianqi.items import LishitianqiItem
     4 
     5 class WeathSpider(scrapy.Spider):
     6     name = 'weath'
     7     allowed_domains = ['lishi.tianqi.com']
     8     start_urls = ['http://lishi.tianqi.com/']
     9 
    10     def parse(self, response):
    11         #选择中国所有城市名及对应url
    12         citys = response.css(".bcity a[target='_blank']::text").getall()
    13         cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall()
    14 
    15         #将url添加到爬虫中,
    16         for city,url in zip(citys[:1],cityurls[:1]):
    17             yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city)
    18 
    19     def city(self, response):
    20         #选择城市所有年月份及对应url
    21         months = response.css(".tqtongji1 a::text").getall()
    22         monthurls = response.css(".tqtongji1 a::attr(href)").getall()
    23 
    24         #将url添加到爬虫中
    25         for url in monthurls:
    26             yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month)
    27 
    28     def month(self, response):
    29         data = []
    30         #选择该月份中每天
    31         for ul in response.css(".tqtongji2 ul[class!='t1']"):
    32             data.append(tuple([response.meta['city']] + [li.css("::text").get() for li in ul.css('li')]))
    33 
    34         #创建item,将数据传递个pipeline
    35         item = LishitianqiItem()
    36         item['data'] = data
    37         yield item
    View Code

    修改后的items.py

     1 # -*- coding: utf-8 -*-
     2 
     3 # Define here the models for your scraped items
     4 #
     5 # See documentation in:
     6 # https://docs.scrapy.org/en/latest/topics/items.html
     7 
     8 import scrapy
     9 
    10 class LishitianqiItem(scrapy.Item):
    11     # define the fields for your item here like:
    12     city = scrapy.Field()
    13     date = scrapy.Field()
    14     maxt = scrapy.Field()
    15     mint = scrapy.Field()
    16     weath = scrapy.Field()
    17     windd = scrapy.Field()
    18     windp = scrapy.Field()
    19     data = scrapy.Field()
    View Code

    修改后的pipelines.py

     1 # -*- coding: utf-8 -*-
     2 
     3 # Define your item pipelines here
     4 #
     5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting
     6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
     7 
     8 import pymysql.cursors
     9 
    10 class LishitianqiPipeline(object):
    11 
    12     def __init__(self):
    13         #连接数据库
    14         self.connect = pymysql.connect(
    15             host='localhost',
    16             port=3306,
    17             db='lishitianqi',
    18             user='root',
    19             passwd='123',
    20             charset='utf8',
    21             use_unicode=True)
    22 
    23         #通过cursor执行增删该查
    24         self.cursor = self.connect.cursor()
    25 
    26     def process_item(self, item, spider):
    27 
    28         #插入天气数据
    29         self.cursor.executemany(
    30             """insert ignore into lishitianqi(city, date, maxt, mint, weath, windd, windp)
    31             values (%s, %s, %s, %s, %s, %s, %s)""", item['data'])
    32 
    33         #提交sql语句
    34         self.connect.commit()
    35 
    36         #必须实现返回
    37         return item
    View Code

    抓取一个城市数据耗时

    2019-09-09 19:30:45 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
    2019-09-09 19:30:56 [scrapy.core.engine] INFO: Spider closed (finished)

    耗时11s,抓取所有数据共计9.6小时,这个时间可以接受了,睡个觉第二天就好了

    数据可靠性优化及修复

     从网站抓取来的数据并非可靠,在前面代码中或许你们发现了,在抓取每天天气数据只使用了一次css选择,然后切片分出各种类型的天气,

    但是这种方式严格依赖数据格式,如果少了一个数据就会导致数据整体偏移,切片出来的结果混乱,后面我多次css选择解决了这个问题。

    除了上面描述的一天中某种类型天气数据缺失,还可能有某天数据缺失,比如:

    MariaDB [lishitianqi]> select * from lishitianqi where city="苏右旗" and date like "2011-01%";
    +-------+--------+------------+------+------+-----------+---------------+-------------+
    | id    | city   | date       | maxt | mint | weath     | windd         | windp       |
    +-------+--------+------------+------+------+-----------+---------------+-------------+
    | 20905 | 苏右旗 | 2011-01-01 |  -13 |  -23 | 晴        | 东北风~东风   | 4-5级       |
    | 20906 | 苏右旗 | 2011-01-02 |  -12 |  -23 | 多云      | 东风~西南风   | 5-6级~4-5级 |
    | 20907 | 苏右旗 | 2011-01-03 |  -13 |  -24 | 多云      | 西南风~西北风 | 4-5级~5-6级 |
    | 20908 | 苏右旗 | 2011-01-04 |  -15 |  -25 | 小雪~多云 | 西北风        | 5-6级~4-5级 |
    | 20909 | 苏右旗 | 2011-01-05 |  -17 |  -24 | 多云      | 西北风        | 5-6级~4-5级 |
    | 20910 | 苏右旗 | 2011-01-06 |  -15 |  -22 | 晴        | 西北风~西风   | 5-6级~4-5级 |
    | 20911 | 苏右旗 | 2011-01-07 |  -16 |  -24 | 晴        | 西北风        | 5-6级~4-5级 |
    | 20912 | 苏右旗 | 2011-01-08 |  -18 |  -27 | 晴        | 西风          | 5-6级~4-5级 |
    | 20913 | 苏右旗 | 2011-01-09 |  -12 |  -23 | 多云      | 西风          | 5-6级~4-5级 |
    | 20914 | 苏右旗 | 2011-01-10 |  -16 |  -23 | 晴        | 西风          | 5-6级~4-5级 |
    | 20915 | 苏右旗 | 2011-01-11 |  -16 |  -28 | 晴~多云   | 西风          | 4-5级       |
    | 20916 | 苏右旗 | 2011-01-12 |  -17 |  -27 | 小雪~多云 | 西北风        | 5-6级~4-5级 |
    | 20917 | 苏右旗 | 2011-01-13 |  -18 |  -26 | 多云      | 西北风        | 4-5级~5-6级 |
    | 20918 | 苏右旗 | 2011-01-14 |  -14 |  -28 | 晴        | 北风~东北风   | 4-5级       |
    | 20919 | 苏右旗 | 2011-01-15 |  -15 |  -25 | 多云~晴   | 西北风~北风   | 5-6级~4-5级 |
    | 20920 | 苏右旗 | 2011-01-16 |  -18 |  -30 | 小雪~多云 | 西北风~西风   | 5-6级~4-5级 |
    | 20921 | 苏右旗 | 2011-01-26 |  -12 |  -26 | 晴        | 西北风        | 5-6级       |
    | 20922 | 苏右旗 | 2011-01-29 |  -14 |  -25 | 多云~晴   | 西北风        | 4-5级~5-6级 |
    | 20923 | 苏右旗 | 2011-01-30 |  -10 |  -19 | 晴        | 西北风        | 4-5级~5-6级 |
    | 20924 | 苏右旗 | 2011-01-31 |   -6 |  -19 | 晴        | 西风          | 4-5级~5-6级 |
    +-------+--------+------------+------+------+-----------+---------------+-------------+
    20 rows in set (0.002 sec)

    其中缺失了17日-25日数据,我的解决方案是:若有缺失数据,尝试从备用网站获取,将数据补全,若备用网站没有该城市或同样缺失该数据的按缺失区间取平均值补全。

    备用网站tianqihoubao.com,创建一个数据修复蜘蛛

    scrapy genspider recovery tianqihoubao.com

    https://lishi.tianqi.com/支持查询的所有城市列表存入citys_primary.txt

    #scrapy shell https://lishi.tianqi.com/
    ...
    In [1]: citys = response.css(".bcity a[target='_blank']::text").getall()
    
    In [2]: f = open("citys_primary.txt","+w",encoding="utf-8")
    
    In [3]: f.write(str(citys))
    Out[3]: 19326
    
    In [4]: f.close()

    http://www.tianqihoubao.com/lishi/支持查询的所有城市列表存入citys_backup.txt

    scrapy shell http://www.tianqihoubao.com/lishi/
    ...
    In [9]: text = response.body.decode("GB2312")
    
    In [10]: from scrapy.selector import Selector
    
    In [11]: sel = Selector(text=text)
    
    In [12]: citys = sel.css(".citychk dd a::text").getall()

    In [13]: citys = [city.strip(' ') for city in citys] In [14]: f = open("citys_backup.txt","+w",encoding="utf-8") In [15]: f.write(str(citys)) Out[15]: 2601 In [16]: f.close()

     爬虫文件recovery.py

     1 # -*- coding: utf-8 -*-
     2 import scrapy
     3 import re
     4 from lishitianqi.items import LishitianqiItem
     5 
     6 class RecoverySpider(scrapy.Spider):
     7     name = 'recovery'
     8     allowed_domains = ['tianqihoubao.com']
     9 
    10     def parse(self, response):
    11         tableTitle = response.css(".wdetail h1::text").get()
    12         #获取城市名,为了防止出现空列表导致KeyError故用join,''.join([]) == ''
    13         city = ''.join(re.findall("([u4E00-u9FA5]+)历史天气预报",tableTitle))
    14 
    15         data = []
    16         for tr in response.css(".wdetail table tr")[1:]:
    17             tds = tr.css("td")
    18             #获取日期
    19             date = tds[0].css("a::text").get()
    20             dl = re.findall("(d{4})年(d{2})月(d{2})日",date)
    21             date = '-'.join(dl[0])  if dl!= [] else '1990-00-00'
    22 
    23             #获取天气
    24             weath = tds[1].css("::text").get()
    25             wl = re.findall("[u4E00-u9FA5]+", weath)
    26             weath = '~'.join(wl) if len(wl) == 2 else '未知~未知'
    27 
    28             #获取最低、最高气温
    29             tem = tds[2].css("::text").get()
    30             tl = re.findall("(d+|-d+)℃", tem)
    31             maxt,mint = tl if len(tl) == 2 else [0,0]
    32 
    33             #获取风向,风速度
    34             win = tds[3].css("::text").get()
    35             wil = re.findall("[u4E00-u9FA50-9-u2265u2264]+", win)
    36             wil = [re.sub('[≤]','小于等于', w) for w in wil]
    37             wil = [re.sub('[≥]','大于等于', w) for w in wil]
    38             windd,windp = ('~'.join(wil[0::2]),'~'.join(wil[1::2])) if len(wil) == 4 else ("未知","未知")
    39 
    40             data.append([city, date, maxt, mint, weath, windd, windp])
    41 
    42         #创建item,将数据传递个pipeline
    43         item = LishitianqiItem()
    44         item['data'] = data
    45         yield item
    View Code 

    修改pipelines.py文件

     1 # -*- coding: utf-8 -*-
     2 
     3 # Define your item pipelines here
     4 #
     5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting
     6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
     7 
     8 import pymysql.cursors
     9 
    10 #用于获取某年某月有多少天, 
    11 import calendar
    12 
    13 #用于获取汉字的拼音, pip install xpinyin
    14 from xpinyin import Pinyin
    15 
    16 
    17 class LishitianqiPipeline(object):
    18 
    19     def __init__(self):
    20         #连接数据库
    21         self.connect = pymysql.connect(
    22             host='localhost',
    23             port=3306,
    24             db='lishitianqi',
    25             user='root',
    26             passwd='123',
    27             charset='utf8',
    28             use_unicode=True)
    29 
    30         #通过cursor执行增删该查
    31         self.cursor = self.connect.cursor()
    32 
    33     def process_item(self, item, spider):
    34 
    35         #插入天气数据
    36         self.cursor.executemany(
    37             """insert ignore into lishitianqi(city, date, maxt, mint, weath, windd, windp)
    38             values (%s, %s, %s, %s, %s, %s, %s)""", item['data'])
    39 
    40         #提交sql语句
    41         self.connect.commit()
    42 
    43         #必须实现返回
    44         return item
    45 
    46     #在爬虫运行前执行这个函数
    47     def open_spider(self, spider):
    48 
    49         if spider.name == 'recovery' and True:
    50             #读取https://lishi.tianqi.com/和http://www.tianqihoubao.com/lishi/支持查询的所有城市列表存, eval将字符串变为python语句, 转换为set用于取交集
    51             citys_primary = set(eval(open("citys_primary.txt","r",encoding="utf-8").read()))
    52             citys_backup = set(eval(open("citys_backup.txt","r",encoding="utf-8").read()))
    53             startyear = 2011
    54             endyear = 2019
    55             endmonth = 9
    56             pin = Pinyin()
    57 
    58             def check_or_appendurl(city,year,month):
    59                 #计算year年month月有多少天
    60                 stmr = calendar.monthrange(year, month)[1]
    61                 #通过sql获取city城市year年month月有多少天
    62                 sql = "select count(city) from lishitianqi where city='%s' and date like '%d-%02d%%'" % (city,year,month)
    63                 self.cursor.execute(sql)
    64                 apmr = self.cursor.fetchall()[0][0]
    65 
    66                 #若数据库中某城市某月的天数不等于该月的实际天数,说明这个城市这个月的数据有丢失,需要构造url到备用网站获取天气数据
    67                 if stmr != apmr:
    68                     url = "http://www.tianqihoubao.com/lishi/%s/month/%d%02d.html" % (pin.get_pinyin(city, ''),year,month)
    69                     print(stmr,apmr,'append url: %s' % url)
    70                     spider.start_urls.append(url)
    71 
    72             #citys_primary为在https://lishi.tianqi.com/支持查询的所有城市列表存,citys_backup为http://www.tianqihoubao.com/lishi/支持查询的所有城市列表
    73             #理想情况下爬虫https://lishi.tianqi.com/就能获取到所有需要的数据,但是从网站抓取来的数据并非可靠,有可能某个月的某几天数据丢失,
    74             #所有需要到备用网站将数据库缺失数据补齐
    75             #备用网站数据集与主用网站数据集不重合,所以在做数据修复的时候要取交集
    76             for city in citys_primary & citys_backup:
    77                 for year in range(startyear,endyear):
    78                     for month in range(1,13):
    79                         check_or_appendurl(city,year,month)
    80 
    81                 for month in range(1,endmonth):
    82                     check_or_appendurl(city,endyear,month)
    83 
    84 
    85     def close_spider(self, spider):
    86         pass
    View Code

    数据实时更新

    每个月都会产生新的数据,如果重新运行前面较本过于耗时,需要创建一个更新数据的蜘蛛

    scrapy genspider update lishi.tianqi.com

    ...  

    关于乱码的终极解决方案

    如果发现某个页面出现乱码,用scrapy shell登入该页面,respons.text为已经解码为字符串,由于解码格式不对会显示为乱码,response.body为未解码的字节流。

    python库chardet可以根据字节流预测该字节流的编码格式,给定字节流越长,判断越准确

    pip install chardet  

    预测response.body编码格式

    In [53]: import chardet

    In [54]: chardet.detect(response.body) Out[54]: {'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}

    GB2312编码的概率为99%,将response.body用GB2312格式解码,就可以看到正确的结果了。

    In [63]: text = response.body.decode("GB2312")
    
    In [64]: from scrapy.selector import Selector
    
    In [65]: sel = Selector(text=text)
    
    In [66]: sel.css("h1::text").get()
    Out[66]: '全国历史天气查询'
    

      

    django实现单页应用

    需求分析

    网站需要支持如下可视化功能(city代表城市,citys代表城市集,year代表年份,years代表年份集,month代表月份。这些字段都为输入项):

    • citys-years月平均气温(曲线图)
    • city-year各天气比率(饼图)
    • citys-years-month日平均气温(曲线图)
    • city-year-month各天气比率(饼图)

    效果预览:pengliangyuan.club

    创建django工程和app

    创建django项目

    django-admin startproject webapps
    

    创建webapp  

    cd webapps
    python .manage.py startapp history_weath

    django使用mysql并引入现有数据库

    在前面蜘蛛已经将databaselishitianqi,tablelishitianqi插入了数据,现在将这些数据引入到django中(根据数据库表生成django模型)

    修改settings.py文件

    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': 'lishitianqi',   #数据库的名字
            'USER': 'root',
            'PASSWORD':'123',
            'HOST':'localhost',
            'PORT':'3306',
        }
    } 

    修改__init__.py

    import pymysql
    pymysql.install_as_MySQLdb()

    执行python manage.py inspectdb命令,根据mysql数据表生成django模型

    #python manage.py inspectdb
    ...
    class Lishitianqi(models.Model):
        city = models.CharField(max_length=16)
        date = models.DateField()
        maxt = models.IntegerField()
        mint = models.IntegerField()
        weath = models.CharField(max_length=20)
        windd = models.CharField(max_length=20)
        windp = models.CharField(max_length=40, blank=True, null=True)
    
        class Meta:
            managed = False
            db_table = 'lishitianqi'
            unique_together = (('city', 'date'),)

    将上述模型copy到app的模型文件中,运行python manage.py makemigrations创建这些更改的迁移

    #python manage.py makemigrations history_weath
    Migrations for 'history_weath':
      history_weathmigrations001_initial.py
        - Create model Lishitianqi

    运行python manage.py migrate,将这些更改应用到数据库

    #python manage.py migrate
    ...
    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, history_weath, sessions
    Running migrations:
      Applying history_weath.0001_initial... OK

    创建城市数据库表,并标记热门城市 

    创建数据表citys,用于表示该webapp支持的所有城市,以及热门城市

    CREATE TABLE `citys` (
      `city` varchar(16) NOT NULL,
      `hot` tinyint(1) DEFAULT 0,
      PRIMARY KEY (`city`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    lishitianqi中查找大所有城市,插入到citys表中,字段distinct用于去重

    insert into citys(city) select distinct city from lishitianqi;

    添加热门城市

    update citys set hot=1 where city in ("三亚","上海","乌鲁木齐","北京","南京","厦门","合肥","天津","广州","成都","拉萨","昆明","杭州","武汉","济南","深圳","西安","贵阳","郑州"," 重庆","长沙","青岛");

    生成django模型,粘贴到history_weath/models.py,并执行迁移命令(操作方式同一小节)

    路由

    history_weath作为一个django app,访问该app的url格式是这样的 DomainName/history_weath/xxx,

    编辑webapps/urls.py

    from django.contrib import admin
    from django.urls import path,include
    from history_weath.views import index 
    
    
    urlpatterns = [
        path('', index)                           #将DomainName重定向到DomainName/history_weath,见history_weath/views.py中的index函数(后面小节会附上)
        path('history_weath/', include('history_weath.urls')),   #引用history_weath/urls.py,访问url = 引用url + 被引用url 
        path('admin/', admin.site.urls),,
    ]

    新建history_weath/urls.py

    from django.urls import path
    from . import views
    
    app_name = 'history_weath'
    urlpatterns = [
        path('', views.home, name='home'),               #history_weath这个app的主页
        path('temperature_year', views.temperature_year),      #url: DomainName/history_weath/temperature_year, 用于ajax(POST)请求citys,years的月平均气温
        path('temperature_month', views.temperature_month),     #url: DomainName/history_weath/temprature_month, 用于ajax(POST)请求citys,years,month日平均气温
        path('weath_year_ratio', views.weath_year_ratio),      #url: DomainName/history_weath/weath_year_ratio, 用于ajax(POST)请求city,year天气比率
        path('weath_month_ratio', views.weath_month_ratio),     #url: DomainName/history_weath/weath_month_ration, 用于ajax(POST)请求city,year,month天气比率
    ]

    视图

     history_weath为单页应用,index用于网站根url到app主页的重定向,home为单页应用的主页,其他的视图个为只专门处理ajax(POST)请求的。

     history_weath/views.py

    from django import views
    from django.http import HttpResponse
    from django.shortcuts import render, redirect,reverse
    from .models import Lishitianqi, Citys
    import django.utils.timezone as timezone
    from xpinyin import Pinyin
    import string
    import json
    from django.http import Http404
    from django.db.models import Avg
    
    
    #将DomainName重定向到DomainName/history_weath
    def index(request):
        return redirect(reverse("history_weath:home"))
    
    #history_weath这个app的主页
    def home(request):
        hot_citys =    Citys.objects.filter(hot=1)
        
        #汉字转换拼音
        pin = Pinyin()
    
        #生成{'A':[], 'B':[], 'C':[] ...}, 为什么用[[] for i in range(26)]而不用[] * 26, 因为前者是浅拷贝,后者是引用
        citys = dict(zip(list(string.ascii_uppercase),[[] for i in range(26)])) 
        
        #将城市名转换为拼音,提取首字母,按不同首字母插入到citys中
        for item in Citys.objects.all():
            city = item.city
            first_letter = pin.get_pinyin(city)[:1].upper()
            citys[first_letter].append(city)
        
        #生成2011至今的年份列表
        years = [year for year in range(2011,timezone.now().year + 1)]
        context = {
            'hot_citys':hot_citys,
            'citys':citys,
            'years':years
        } 
    
        #将这些数据放入到django模板渲染后显示给用户
        return render(request, 'history_weath/home.html', {'context':context})
        
    
    #url: DomainName/history_weath/temperature_year, 用于ajax(POST)请求citys,years的月平均气温
    def temperature_year(request):
        post_data = json.loads(request.body)
        citys = post_data.get('citys',[])
        years = post_data.get('years',[])
        print("REQUEST:",citys,years)
        if citys == [] or years == []:
            raise Http404("Question does not exist")
        else:
            result = []
            for city in citys:
                for year in years:
                    city_data = {'name':'%s(%s年)'%(city,year), 'data':[]}
                    for month in range(1,13):
                        #分别计算一个月的每天最高温度平均温度,一个月的每天最低温度平均温度
                        month_avg = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month).aggregate(Avg('maxt'),Avg('mint'))
                        #在将一个月的每天最高温度平均温度,一个月的每天最低温度平均温度求平均
                        mavg = round((month_avg['maxt__avg'] + month_avg['mint__avg'])/2) if month_avg['maxt__avg'] is not None else 0
                        city_data['data'].append(mavg)
                    result.append(city_data)
    
            #该曲线图参考https://jshare.com.cn/demos/hhhhxu,其中series.data即为ajax请求返回的数据
            print("RESPONSE:",result)
            return HttpResponse(json.dumps(result))
    
    #url: DomainName/history_weath/temprature_month, 用于ajax(POST)请求citys,years,month日平均气温
    def temperature_month(request):
        post_data = json.loads(request.body)
        citys = post_data.get('citys',[])
        years = post_data.get('years',[])
        month = post_data.get('month',"")
        print("REQUEST",citys, years, month)
        if citys == [] or years == [] or month == "":
            raise Http404("Question does not exist")
        else:
            result = []
            for city in citys:
                for year in years:
                    month_obj = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month)
                    #在这里,日平均温度 = 日最高温度 + 日最低温度 / 2
                    month_data = [round((day.maxt + day.mint)/2,1) for day in month_obj]
                    city_data = {'name':'%s(%s年%s月)'%(city,year,month), 'data':month_data}
                    result.append(city_data)
            
            #该曲线图参考https://jshare.com.cn/demos/hhhhxu,其中series.data即为ajax请求返回的数据
            print("RESPONSE",result)
            return HttpResponse(json.dumps(result))
    
    #url: DomainName/history_weath/weath_year_ratio, 用于ajax(POST)请求city,year天气比率
    def weath_year_ratio(request):
        post_data = json.loads(request.body)
        city = post_data.get('city',"")
        year = post_data.get('year',"")
        print("REQUEST",city, year)
        if city == "" or year == "":
            raise Http404("Question does not exist")
        else:
            
            weath_year = Lishitianqi.objects.filter(city=city,date__year=year)
            weath_cnts = [d.weath for d in weath_year]
            weath_mods = set(weath_cnts)
            data=[]
            for wd in weath_mods:
                cnt = weath_cnts.count(wd)
                data.append({
                    'name':wd,                        #天气类型,如晴,多云,多云转晴,小雨
                    'y': cnt / len(weath_cnts)        #该类天气在一年天气中所占比例
                    })
            
            #按天气比例排序
            data.sort(key=lambda x:x['y'])
    
            #将比例最高天气从饼中切出
            data[-1]['sliced'] = 1
            #选择比例最高的天气
            data[-1]['selected'] = 1
    
            #该饼图参考https://jshare.com.cn/demos/hhhhDX,其中series.data即为ajax请求返回的数据
            result = {'name':'%s(%s年)'%(city,year), 'data':data}
            print("RESPONSE",result)
            return HttpResponse(json.dumps(result))
    
    #url: DomainName/history_weath/weath_month_ration, 用于ajax(POST)请求city,year,month天气比率
    def weath_month_ratio(request):
        post_data = json.loads(request.body)
        city = post_data.get('city',"")
        year = post_data.get('year',"")
        month = post_data.get('month',"")
        print("REQUEST",city, year,month)
        if city == "" or year == "" or month == "":
            raise Http404("Question does not exist")
        else:
            
            weath_year = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month)
            weath_cnts = [d.weath for d in weath_year]
            weath_mods = set(weath_cnts)
            data=[]
            for wd in weath_mods:
                cnt = weath_cnts.count(wd)
                data.append({
                    'name':wd,                    #天气类型,如晴,多云,多云转晴,小雨
                    'y': cnt / len(weath_cnts)    #该类天气在一个月天气中所占比例
                    })
    
            #按天气比例排序
            data.sort(key=lambda x:x['y'])
    
            #将比例最高天气从饼中切出
            data[-1]['sliced'] = 1
            #选择比例最高的天气
            data[-1]['selected'] = 1
    
            #该饼图参考https://jshare.com.cn/demos/hhhhDX,其中series.data即为ajax请求返回的数据
            result = {'name':'%s(%s年%s月)'%(city,year,month), 'data':data}
            print("RESPONSE",result)
            return HttpResponse(json.dumps(result))
    View Code

    模板

    创建目录

    history_weath	emplateshistory_weath  

    由于是单页应用,只有home.html。

    home.html

     load staticfiles 
    
    <!doctype html>
    <html lang="en">
      <head>
          <script src=" static 'history_weath/jquery.min.js' "></script>
          <link href=" static 'history_weath/bootstrap.min.css' " rel="stylesheet">
          <link href=" static 'history_weath/home.css' " rel="stylesheet">
    
      </head>
      <body>
      <nav class="navbar navbar-dark fixed-top bg-dark flex-md-nowrap p-0 shadow">
        <a class="navbar-brand col-sm-3 col-md-2 mr-0" href=" url 'history_weath:home' ">全国各城市天气可视化系统</a>
      </nav>
    
    <div class="container-fluid">
      <div class="row">
        <nav class="col-md-2 d-none d-md-block bg-light sidebar">
          <div class="sidebar-sticky">
            <ul class="nav flex-column">
              <li class="nav-item ">
                <a class="active nav-link" id="page1_index" href="#">
                  <span data-feather="bar-chart-2"></span>
                  数据可视化
                </a>
              </li>
              <li class="nav-item ">
                <a class="nav-link" href="#" id="page2_index">
                  <span data-feather="layers"></span>
                  数据分析
                </a>
              </li>
            </ul>
    
         
          </div>
        </nav>
    
        <main role="main" class="col-md-9 ml-sm-auto col-lg-10 px-4" style="padding-top:70px">
    
          <div id="page1">
            
                      <div clsss="container">
                      <div class="row">
                      <div class="col-md-3">
    
                       <div class="input-group mb-3">
                        <div class="input-group-prepend" id="page1_iter">
                          <button type="button" class="btn btn-outline-secondary" id="page2_city">城市</button>
                          <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                            <span class="sr-only">Toggle Dropdown</span>
                          </button>
                          <div class="dropdown-menu">
                             for city in context.hot_citys 
                                  <button class="dropdown-item page1_add_city">{{ city.city }}</button>
                               endfor 
                               for k,v in context.citys.items 
                                <div role="separator" class="dropdown-divider"></div><p class="text-center">{{ k }}</p>
                                 for city in v 
                                  <button class="dropdown-item page1_add_city">{{ city }}</button>
                                 endfor 
                               endfor 
                          </div>
                        </div>
                        <input type="text" class="form-control" aria-label="Text input with segmented dropdown button">
                       
                    </div>
    
    
                      <div class="input-group mb-3">
                          <div class="input-group-prepend" id="page1_iter">
                            <button type="button" class="btn btn-outline-secondary" id="page1_year">年份</button>
                            <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                              <span class="sr-only">Toggle Dropdown</span>
                            </button>
                            <div class="dropdown-menu">
                               for year in context.years 
                                <button class="dropdown-item page1_add_year">{{ year }}</button>
                               endfor 
                            </div>
                          </div>
                      </div>
                      </div>
                          
    
                      <div class="col-md-1 ">
                            <div class="card " style='height: 100%; background-color: #e5e5e5'>
                                  <span class="badge badge-info page1_city_list" ></span>
                                  <div id="page1_city_label"></div>
                            </div>
    
                      </div>
                      <div class="col-md-1">
                            <div class="card " style='height: 100%; background-color: #e5e5e5'>
                                  <span class="badge badge-success page1_year_list" ></span>
                                  <div id="page1_year_label"></div>
                            </div>
    
                      </div>
                    </div>
                  </div>
    
    
                      <button type="button" class="btn btn-secondary btn-lg btn-block" style="margin-top: 10px" id="page1_send">开始</button>
                     
    
    
      
                    <div id="page1_year_content" style="display: none">
    
                      <div id="page1_chart1" style="min-400px;height:400px; 100%"></div>
    
    
                       <div class="input-group mb-3">
                          <div class="input-group-prepend" id="page1_iter">
                            <button type="button" class="btn btn-outline-secondary" id="page1_year_weath">气候</button>
                            <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                              <span class="sr-only">Toggle Dropdown</span>
                            </button>
                            <div class="dropdown-menu">
                                <button class="dropdown-item page1_add_year_weath">关闭</button>
                                <div id="page1_year_weath_label"></div>
                            </div>
                          </div>
                      </div>
                      
                        <div id="page1_chart3" style="min-400px;height:400px; 100%; display: none;"></div>
                   
    
    
                        <div class="btn-group mr-2" role="group" aria-label="First group" style=" 100%">
                            <button type="button" class="btn btn-secondary page1_month" month=1>一月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=2>二月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=3>三月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=4>四月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=5>五月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=6>六月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=7>七月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=8>八月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=9>九月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=10>十月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=11>十一月</button>
                            <button type="button" class="btn btn-secondary page1_month" month=12>十二月</button>
                        </div>
    
                      <div id="page1_month_content" style="display: none">
                        <div id="page1_chart2" style="min-400px;height:400px; 100%"></div>
    
    
                        <div class="input-group mb-3">
                          <div class="input-group-prepend" id="page1_iter">
                            <button type="button" class="btn btn-outline-secondary" id="page1_month_weath">气候</button>
                            <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                              <span class="sr-only">Toggle Dropdown</span>
                            </button>
                            <div class="dropdown-menu">
                                <button class="dropdown-item page1_add_month_weath">关闭</button>
                                <div id="page1_month_weath_label"></div>
                            </div>
                          </div>
                      </div>
                      
                        <div id="page1_chart4" style="min-400px;height:400px; 100%; display: none;"></div>
                      </div>
                    </div>
    
    
            </div>
    
    
    
            <div id="page2" style="display: none">
                  
            </div>
    
          
        </main>
      </div>
    </div>
            <script src=" static 'history_weath/highcharts.js' "></script>
            <script src=" static 'history_weath/exporting.js' "></script>
            <script src=" static 'history_weath/oldie.js' "></script>
            <script src=" static 'history_weath/highcharts-zh_CN.js' "></script>
    
           
            <script src=" static 'history_weath/popper.min.js' "></script>
            <script src=" static 'history_weath/bootstrap.min.js' " ></script>
            <script src=" static 'history_weath/feather.min.js' "></script>
            <script src=" static 'history_weath/home.js' "></script>
           
    </body>
    </html>
    View Code

    静态文件

    创建目录

    history_weathstatichistory_weath

    home.css

    body {
      font-size: .875rem;
    }
    
    .feather {
      width: 16px;
      height: 16px;
      vertical-align: text-bottom;
    }
    
    /*
     * Sidebar
     */
    
    .sidebar {
      position: fixed;
      top: 0;
      bottom: 0;
      left: 0;
      z-index: 100; /* Behind the navbar */
      padding: 48px 0 0; /* Height of navbar */
      box-shadow: inset -1px 0 0 rgba(0, 0, 0, .1);
    }
    
    .sidebar-sticky {
      position: relative;
      top: 0;
      height: calc(100vh - 48px);
      padding-top: .5rem;
      overflow-x: hidden;
      overflow-y: auto; /* Scrollable contents if viewport is shorter than content. */
    }
    
    @supports ((position: -webkit-sticky) or (position: sticky)) {
      .sidebar-sticky {
        position: -webkit-sticky;
        position: sticky;
      }
    }
    
    .sidebar .nav-link {
      font-weight: 500;
      color: #333;
    }
    
    .sidebar .nav-link .feather {
      margin-right: 4px;
      color: #999;
    }
    
    .sidebar .nav-link.active {
      color: #007bff;
    }
    
    .sidebar .nav-link:hover .feather,
    .sidebar .nav-link.active .feather {
      color: inherit;
    }
    
    .sidebar-heading {
      font-size: .75rem;
      text-transform: uppercase;
    }
    
    /*
     * Content
     */
    
    [role="main"] {
      padding-top: 133px; /* Space for fixed navbar */
    }
    
    @media (min- 768px) {
      [role="main"] {
        padding-top: 48px; /* Space for fixed navbar */
      }
    }
    
    /*
     * Navbar
     */
    
    .navbar-brand {
      padding-top: .75rem;
      padding-bottom: .75rem;
      font-size: 1rem;
      background-color: rgba(0, 0, 0, .25);
      box-shadow: inset -1px 0 0 rgba(0, 0, 0, .25);
    }
    
    .navbar .form-control {
      padding: .75rem 1rem;
      border-width: 0;
      border-radius: 0;
    }
    
    .form-control-dark {
      color: #fff;
      background-color: rgba(255, 255, 255, .1);
      border-color: rgba(255, 255, 255, .1);
    }
    
    .form-control-dark:focus {
      border-color: transparent;
      box-shadow: 0 0 0 3px rgba(255, 255, 255, .25);
    }
    View Code

    home.js

    (function () {
      'use strict'
    
    feather.replace()
        
    var page1_chart1 = Highcharts.chart('page1_chart1', {
                chart: {
                    type: 'line'
                },
                title: {
                    text: ''
                },
                subtitle: {
                    text: ''
                },
                xAxis: {
                    categories: ['一月', '二月', '三月', '四月', '五月', '六月', '七月', '八月', '九月', '十月', '十一月', '十二月']
                },
                yAxis: {
                    title: {
                        text: '气温 (°C)'
                    }
                },
                plotOptions: {
                    line: {
                        dataLabels: {
                            // 开启数据标签
                            enabled: true          
                        },
                        // 关闭鼠标跟踪,对应的提示框、点击事件会失效
                        enableMouseTracking: false
                    }
                },
                series: []
    });
    
    var page1_chart2 = Highcharts.chart('page1_chart2', {
                chart: {
                    type: 'line'
                },
                title: {
                    text: ''
                },
                subtitle: {
                    text: ''
                },
                xAxis: {
                    categories: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
                },
                yAxis: {
                    title: {
                        text: '气温 (°C)'
                    }
                },
                plotOptions: {
                    line: {
                        dataLabels: {
                            // 开启数据标签
                            enabled: true          
                        },
                        // 关闭鼠标跟踪,对应的提示框、点击事件会失效
                        enableMouseTracking: false
                    }
                },
                series: []
    });
    
    
    var page1_chart3 = Highcharts.chart('page1_chart3', {
        chart: {
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
        },
        title: {
            text: ''
        },
        tooltip: {
            pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>'
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %',
                    style: {
                        color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black'
                    }
                }
            }
        },
        series: []
    });
    
    var page1_chart4 = Highcharts.chart('page1_chart4', {
        chart: {
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
        },
        title: {
            text: ''
        },
        tooltip: {
            pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>'
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %',
                    style: {
                        color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black'
                    }
                }
            }
        },
        series: []
    });
    
    $("#page1_index").click(function(){
    
            $("#page1").show()
            $("#page2").hide()
            $("#page2_index").removeClass("active")
            $("#page1_index").addClass("active")
    });
    
    $("#page2_index").click(function(){
    
            $("#page2").show()
            $("#page1").hide()
            $("#page1_index").removeClass("active")
            $("#page2_index").addClass("active")
    });
    
    
    $(".page1_add_city").click(function(){
            if($(".page1_city_list").text().indexOf($(this).text()) == -1){
                var insert = $(".page1_city_list").first().clone().text($(this).text())
                insert.bind("click", function(){
                    $(this).remove()
                })
                $("#page1_city_label").before(insert)
            }
    });
    
    
    
    $(".page1_add_year").click(function(){
            if($(".page1_year_list").text().indexOf($(this).text()) == -1){
                var insert = $(".page1_year_list").first().clone().text($(this).text())
                insert.bind("click", function(){
                    $(this).remove()
                })
                $("#page1_year_label").before(insert)
            }
    });
    
    
    $(".page1_add_year_weath").first().click(function(){
            $("#page1_chart3").hide()
            $("#page1_year_weath").text("气候")
    });
    
    
    $(".page1_add_month_weath").first().click(function(){
            $("#page1_chart4").hide()
            $("#page1_month_weath").text("气候")
    });
    
    
    $("#page1_send").click(function(){
    
        var city_list = $(".page1_city_list")
        var citys = []
        for(var i = 1,p = city_list.first().next(); i < city_list.length; i++, p=p.next())
            citys[i-1] = p.text()
        var year_list = $(".page1_year_list")
        var years = []
        for(var i = 1,p = year_list.first().next(); i < year_list.length; i++, p=p.next())
            years[i-1] = p.text()
        if(citys.length > 0 && years.length > 0)
        {
    
            //删除
            for(var z = $(".page1_add_year_weath").first().next(); z.text() != ""; z = $(".page1_add_year_weath").first().next())
            {
                z.remove()
            }
    
            for(var i = 0; i < citys.length; i++)
            {
                for(var j = 0; j < years.length; j++)
                {
    
                    var city = citys[i];
                    var year = years[j];
                    var text = city + year + "年各种天气所占比例"
    
                    //添加
                    var insert = $(".page1_add_year_weath").first().clone().text(text)
                        insert.bind("click", function(){
                            $("#page1_chart3").show()
                            $("#page1_year_weath").text($(this).text())
                            $.ajax({
                                type:"POST",
                                url:"/history_weath/weath_year_ratio",
                                data:JSON.stringify({city:city, year:year}),
                                success:function(result){
    
                                    //删除数据
                                    var len = page1_chart3.series.length;
                                    for(var i = len - 1; i >= 0; i--){
                                        page1_chart3.series[i].remove();
                                    }
                            
    
                                    //新增数据
                                    var data = JSON.parse(result)
                                    page1_chart3.addSeries(data);
                                    
    
                                    var title = city + "-" + year +"年各种天气所占比例饼图";
                                    page1_chart3.setTitle({text:title,useHTML:true})
                                }
                            });
                        })
                        $("#page1_year_weath_label").before(insert)
        
                }
            }
    
            $.ajax({
                type:"POST",
                url:"/history_weath/temperature_year",
                data:JSON.stringify({citys:citys, years:years}),
                success:function(result){
    
                    //删除数据
                    var len = page1_chart1.series.length;
                    for(var i = len - 1; i >= 0; i--){
                        page1_chart1.series[i].remove();
                    }
    
                    //新增数据
                    var data = JSON.parse(result)
                    for(var i = 0; i < data.length; i++){
                        page1_chart1.addSeries(data[i]);
                    }
    
                    var title = citys.toString() + "-" + years.toString() +"年月平均气温曲线图";
                    page1_chart1.setTitle({text:title,useHTML:true})
                    $("#page1_year_content").show()    
                }
            });
        }
    });
    
    
    $(".page1_month").click(function(){
        var city_list = $(".page1_city_list")
        var citys = []
        for(var i = 1,p = city_list.first().next(); i < city_list.length; i++, p=p.next())
            citys[i-1] = p.text()
        var year_list = $(".page1_year_list")
        var years = []
        for(var i = 1,p = year_list.first().next(); i < year_list.length; i++, p=p.next())
            years[i-1] = p.text()
        var month = $(this).attr('month')
        var month_name = $(this).text()
        if(citys.length > 0 && years.length > 0)
        {
            //删除
            for(var z = $(".page1_add_month_weath").first().next(); z.text() != "";z = $(".page1_add_month_weath").first().next())
            {
                z.remove()
            }
            
            for(var i = 0; i < citys.length; i++)
            {
                for(var j = 0; j < years.length; j++)
                {
    
                    var city = citys[i];
                    var year = years[j];
                    var text = city + year + "年" + month + "月各种天气所占比例"
    
                    //添加
                    var insert = $(".page1_add_month_weath").first().clone().text(text)
                        insert.bind("click", function(){
                            $("#page1_chart4").show()
                            $("#page1_month_weath").text($(this).text())
                            $.ajax({
                                type:"POST",
                                url:"/history_weath/weath_month_ratio",
                                data:JSON.stringify({city:city, year:year, month:month}),
                                success:function(result){
    
                                    //删除数据
                                    var len = page1_chart4.series.length;
                                    for(var i = len - 1; i >= 0; i--){
                                        page1_chart4.series[i].remove();
                                    }
                            
    
                                    //新增数据
                                    var data = JSON.parse(result)
                                    page1_chart4.addSeries(data);
                                    
    
                                    var title = city + "-" + year +"年" + month + "月" + "各种天气所占比例饼图";
                                    page1_chart4.setTitle({text:title,useHTML:true})
                                }
                            });
                        })
                        $("#page1_month_weath_label").before(insert)
        
                }
            }
    
    
            $.ajax({
                type:"POST",
                url:"/history_weath/temperature_month",
                data: JSON.stringify({citys:citys, years:years, month:month}),
                dataType:'json',
                success:function(result){
                        //删除数据
                        var len = page1_chart2.series.length;
                        for(var i = len - 1; i >= 0; i--){
                            page1_chart2.series[i].remove();
                        }
    
                        //新增数据
                        var data = result
                        for(var i = 0; i < data.length; i++){
                            page1_chart2.addSeries(data[i]);
                        }
    
                        var title = citys.toString() + "-" + years.toString() +"年" + month  + "月日平均气温曲线图";
                        page1_chart2.setTitle({text:title,useHTML:true})
                        $("#page1_month_content").show()    
    
                }
            });
        }
    
    });
    
    
    
    
    }())
    View Code

    jquery.min.js

    popper.min.js

    bootstrap.min.css

    bootstrap.min.js

    feather.min.js

    highcharts.js

    highcharts-zh_CN.js

    oldie.js

    exporting.js

     关于django使用mysql报错的问题

    错误:django.core.exceptions.ImproperlyConfigured: mysqlclient 1.3.13 or newer is required; you have 0.9.3.

    解决办法:修改python库路径下/Lib/site-packages/django/db/backends/mysql的base.py文件,注释掉版本低于1.3.13抛出错误这两行代码

    #if version < (1, 3, 13):
    #    raise ImproperlyConfigured('mysqlclient 1.3.13 or newer is required; you have %s.' % Database.__version__)

    错误:AttributeError: 'str' object has no attribute 'decode'

    解决办法:修改python库路径下/Lib/site-packages/django/db/backends/mysql的operations.py文件,判断query为bytes类型时才执行decode

            if query is not None:
                if isinstance(query,bytes):
                    query = query.decode(errors='replace')
            return query 

      

    django,uWSGI,nginx部署

    前面的部分都是在windows上完成的开发,现在需要部署到个人云服务器

    mysql数据库备份与还原

    现在要将前面爬虫花了9小时爬取的数据从个人电脑迁移到云服务器。

    数据库备份

    mysqldump -u root -p lishitianqi > d:ackup.sql

    将backup.sql拷贝的云服务器,数据库还原

    MariaDB [(none)]> create database lishitianqi;
    ...
    [root@VM_0_12_centos mysql]# mysql -u root -p --default-character-set=utf8 lishitianqi <  backup.sql

    django部署清单

    检查settings.py

     python manage.py check --deploy
    System check identified some issues:
    
    WARNINGS:
    ?: (security.W003) You don't appear to be using Django's built-in cross-site request forgery protection via the middleware ('django.middleware.csrf.CsrfViewMiddleware' is not in your MIDDLEWARE). Enabling the middleware is the safest approach to ensure you don't leave any holes.
    ?: (security.W004) You have not set a value for the SECURE_HSTS_SECONDS setting. If your entire site is served only over SSL, you may want to consider setting a value and enabling HTTP Strict Transport Security. Be sure to read the documentation first; enabling HSTS carelessly can cause serious, irreversible problems.
    ?: (security.W006) Your SECURE_CONTENT_TYPE_NOSNIFF setting is not set to True, so your pages will not be served with an 'x-content-type-options: nosniff' header. You should consider enabling this header to prevent the browser from identifying content types incorrectly.
    ?: (security.W007) Your SECURE_BROWSER_XSS_FILTER setting is not set to True, so your pages will not be served with an 'x-xss-protection: 1; mode=block' header. You should consider enabling this header to activate the browser's XSS filtering and help prevent XSS attacks.
    ?: (security.W008) Your SECURE_SSL_REDIRECT setting is not set to True. Unless your site should be available over both SSL and non-SSL connections, you may want to either set this setting True or configure a load balancer or reverse-proxy server to redirect all connections to HTTPS.
    ?: (security.W012) SESSION_COOKIE_SECURE is not set to True. Using a secure-only session cookie makes it more difficult for network traffic sniffers to hijack user sessions.
    ?: (security.W018) You should not have DEBUG set to True in deployment.
    ?: (security.W019) You have 'django.middleware.clickjacking.XFrameOptionsMiddleware' in your MIDDLEWARE, but X_FRAME_OPTIONS is not set to 'DENY'. The default is 'SAMEORIGIN', but unless there is a good reason for your site to serve other parts of itself in a frame, you should change it to 'DENY'.
    
    System check identified 8 issues (0 silenced).

    按照上述描述的问题一一解决

    python虚拟环境

    pip install virtualenv
    pip install virtualenvwrapper
    
    #创建目录用来存放虚拟环境
    mkdir $HOME/.virtualenvs
    
    #在~/.bashrc中添加行
    export WORKON_HOME=$HOME/.virtualenvs
    export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/python3/bin/virtualenv
    source /usr/local/python3/bin/virtualenvwrapper.sh
    
    
    #让配置文件生效
    source ~/.bashrc
    
    #现在mkvirtualenv, workon, deactive,rmvirtualenv命令可以用了
    
    #创建虚拟环境
    mkvirtualenv -p python3 django_env_py3
    
    #进入虚拟环境
    workon django_env_py3
    
    #退出虚拟环境
    deactivate

    #删除虚拟环境
    revirtualenv django_env_py3

    工程迁移

    进入django虚拟环境

    workon django_env_py3

    将此工程拷贝到云服务器,安装依赖库,修改连接数据库参数:主机(localhost需要改为127.0.0.1),端口号,用户名,密码,让工程能正常运行。关于连接mysql Error问题上文提到过。

    spider/lishitianqi/pipelines.py

                    #连接数据库
                    self.connect = pymysql.connect(
                            host='127.0.0.1',
                            port=8809,
                            db='lishitianqi',
                            user='root',
                            passwd='123',
                            charset='utf8',
                            use_unicode=True)

    webapps/webapps/settings.py

    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': 'lishitianqi',
            'USER': 'root',
            'PASSWORD':'123',
            'HOST':'127.0.0.1',
            'PORT':'8809',
        }
    }

    生成requirements.txt

    pip freeze > requirements.txt
    asn1crypto==0.24.0
    attrs==19.1.0
    Automat==0.7.0
    cffi==1.12.3
    constantly==15.1.0
    cryptography==2.7
    cssselect==1.1.0
    Django==2.2.5
    hyperlink==19.0.0
    idna==2.8
    incremental==17.5.0
    lxml==4.4.1
    parsel==1.5.2
    pyasn1==0.4.7
    pyasn1-modules==0.2.6
    pycparser==2.19
    PyDispatcher==2.0.5
    PyHamcrest==1.9.0
    PyMySQL==0.9.3
    pyOpenSSL==19.0.0
    pytz==2019.2
    queuelib==1.5.0
    Scrapy==1.7.3
    service-identity==18.1.0
    six==1.12.0
    sqlparse==0.3.0
    Twisted==19.7.0
    uWSGI==2.0.18
    w3lib==1.21.0
    xpinyin==0.5.6
    zope.interface==4.6.0

    通过requirements.txt安装python库

    pip install -r requirements.txt 

    uWSIG服务器部署

    安装uwsgi

    pip install uwsgi

    在django项目的setting.py文件的同级目录下,增加一个配置文件uwsgi.ini

    [uwsgi]
    #使用nginx连接时使用
    socket=127.0.0.1:9090
    #直接做web服务器使用
    #http=127.0.0.1:9090
    #项目目录
    chdir=/root/project/fullstack_history_weath/webapps
    #项目中wsgi.py文件的目录,相对于项目目录
    wsgi-file=webapps/wsgi.py
    # 指定启动的工作进程数
    processes=4
    # 指定工作进程中的线程数
    threads=2
    master=True
    # 保存启动之后主进程的pid
    pidfile=uwsgi.pid
    # 设置uwsgi后台运行,用uwsgi.log保存日志信息
    daemonize=uwsgi.log
    # 设置虚拟环境的路径
    virtualenv=/root/.virtualenvs/django_env_py3

    uwsgi相关命令

    #uWSGI 通过 xxx.ini 启动后会在相同目录下生成一个 xxx.pid 的文件,里面只有一行内容是 uWSGI 的主进程的进程号。
    
    #uWSGI 启动:
    uwsgi --ini xxx.ini
    
    #uwsgi 重启:
    uwsgi --reload xxx.pid
    
    #uwsgi 停止:
    uwsgi --stop xxx.pid

    #强制关闭
    pkill uwsgi

    nginx服务器部署

    安装nginx

    wget http://nginx.org/download/nginx-1.8.0.tar.gz
    tar -xvf nginx-1.8.0.tar.gz
    cd nginx-1.8.0
    ./configure
    make
    make install  
    ln -s /usr/local/nginx/sbin/nginx /usr/bin/nginx

    运行nginx

    /usr/local/nginx/sbin/nginx
    

    打开浏览器访问此机器的 IP,如果浏览器出现 Welcome to nginx! 则表示 Nginx 已经安装并运行成功。

    nginx相关命令:

    #重启:
    nginx –s reload
    
    #停止:
    nginx –s stop
    
    #测试配置文件是否正常:
    nginx –t
    
    #强制关闭:
    pkill nginx

    配置nginx

            location / {
                include  uwsgi_params;
                uwsgi_pass  127.0.0.1:9090;
            }
    
            location /static/ {
                autoindex on;
                alias  /root/project/fullstack_history_weath/webapps/static/;
            }
    

      

    部署出现的问题 

    问题:uwsgi日志报错“ModuleNotFoundError: No module named 'encodings'

    解决办法:uwsgi配置文件中虚拟环境路径设置错误,可以通过

    In [1]: import sys
    
    In [2]: sys.path
    Out[2]:
    ['/root/.virtualenvs/django_env_py3/lib/python3.6/site-packages',
     '/usr/local/python3/bin',
     '/usr/local/python3/lib/python36.zip',
     '/usr/local/python3/lib/python3.6',
     '/usr/local/python3/lib/python3.6/lib-dynload',
     '',
     '/usr/local/python3/lib/python3.6/site-packages',
     '/usr/local/python3/lib/python3.6/site-packages/IPython/extensions',
     '/root/.ipython']

    因为我的虚拟环境名为django_env_py3,所以

    # 设置虚拟环境的路径
    virtualenv=/root/.virtualenvs/django_env_py3

    问题:uwsgi日志报错.“--- no python application found, check your startup logs for errors ---

    问题:uwsgi日志报错“chdir(): No such file or directory [core/uwsgi.c line 2623]

    解决办法: 上面两个问题第一个是没有设置chdir,第二个是chdir路径设置错误导致,正确写法如下

    #django项目根目录
    chdir=/root/project/fullstack_history_weath/webapps
    

      

    问题:nginx能加载html,加载静态文件显示 404(Not Found)

    可能的原因:

    1. alias  /root/project/fullstack_history_weath/webapps/static末尾没加/会导致

    2. python setting.py中的STATIC_ROOT和nginx配置文件中静态文件规则location /static/上的路径不一致

    3. python没有执行命令python manage.py collectstatic收集所有静态文件到STATIC_ROOT路径下

    问题:nginx能加载html,加载静态文件显示403(Forbidden)

    解决方法:静态文件以及他们的所有父目录权限不足,例如有静态文件路径/root/project/fullstack_history_weath/webapps/static/,若目录project的权限为644,

    static中所有的文件权限为644,该路径中的其余文件夹(root, fullstack_history_weath,webapps)权限为655,则需要做如下修改

    chmod 655 /root/project
    chmod -R 655 /root/project/fullstack_history_weath/webapps/static

    项目代码获取

    至此,整个项目已经全部完成,完整代码获取见下面连接。

    nginx文件下载:http://pengliangyuan.club/files/Chinese-historical-weather-visualization-system/

    github连接:https://github.com/pengliangyuan/Chinese-historical-weather-visualization-system

    番外篇-nginx目录美化

    1. 给nginx添加fanyindex插件,下载源码ngx-fancyindex
    2. 将源码解压后放在nginx源码目录下
    3. 在编译选项中添加这个插件./configure --add-module=ngx-fancyindex-0.4.3(插件解压后的目录名)
    4. makemake installln -s /usr/local/nginx/sbin/nginx /usr/bin/nginx
    5. 在目录/usr/local/nginx/html下,下载fancyindex主题git clone https://github.com/TheInsomniac/Nginx-Fancyindex-Theme.git
    6. 修改/usr/local/nginx/conf/nginx.conf,配置如下,nginx -s reload
            location /files {
                include /usr/local/nginx/html/Nginx-Fancyindex-Theme/fancyindex.conf; 
                alias  /root/files/;
                charset utf-8,gbk; 
            }
    
            location /Nginx-Fancyindex-Theme/ {
                alias /usr/local/nginx/html/Nginx-Fancyindex-Theme/;
            }
    

      

  • 相关阅读:
    Python的map、filter、reduce函数
    C/C++中extern关键字详解
    python中的多继承
    用python爬虫抓站的一些技巧总结
    python中的OO
    互斥量、条件变量与pthread_cond_wait()函数的使用,详解
    C/C++ struct位结构(位域)
    VS2008无法启动asp.net提示“无法启动程序: http://localhost/.../test.aspx” 数据无效”。
    昨晚比较开心,QQ2009 sp6的0x00dd发送出去的包终于搞明白了
    CRC32 CRC16 校验算法 C# 代码
  • 原文地址:https://www.cnblogs.com/plyonfly/p/11493115.html
Copyright © 2011-2022 走看看