在前面博客中写过python爬虫爬取各个城市历史天气及数据可视化,感觉还是有点粗糙,本文章是借这个例子抛砖引玉,写出个更NB的python全栈项目。
此项目预览页面全国各城市天气可视化系统
scrapy抓取全国各城市的历史天气
创建项目及准备工作
创建项目
scrapy startproject lishitianqi
创建爬虫脚本文件(考虑到tianqihoubao.com访问速度实在是太慢,现在换用lishi.tianqi.com)
cd lishitianqi scrapy genspider weath lishi.tianqi.com
测试该url是否可以直接爬取
scrapy crawl weath ... 2019-09-09 13:22:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 http://lishi.tianqi.com/>: HTTP status code is not handled or not allowed ...
在测试是否是user-agent导致
In [15]: import requests In [16]: requests.get("https://lishi.tianqi.com/") Out[16]: <Response [403]> In [17]: requests.get("https://lishi.tianqi.com/",headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl ...: eWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'}) Out[17]: <Response [200]>
在setings.py中设置user-agent的值
USER_AGENT='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
编写爬虫文件获取天气数据
请求http://lishi.tianqi.com/后将返回全国所有城市页面,通过css选择器选择所有城市url,并添加到爬虫中
def parse(self, response): #选择中国所有城市名及对应url citys = response.css(".bcity a[target='_blank']::text").getall() cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall() #将url添加到爬虫中, for city,url in zip(citys,cityurls): yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city) def city(self, response): pass
请求http://lishi.tianqi.com/beijing/index.html,返回北京2011年1月-至今的所有月份页面,通过css选择器选择该城市所有月份url,并添加到爬虫中
def city(self, response): #选择城市所有年月份及对应url months = response.css(".tqtongji1 a::text").getall() monthurls = response.css(".tqtongji1 a::attr(href)").getall() #将url添加到爬虫中 for url in monthurls: yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month) def month(self, response): pass
请求http://lishi.tianqi.com/beijing/201101.html,返回北京2011年1月天气页面,通过css选择器选择该月的每天的天气
def month(self, response): #选择该月份中每天的天气数据 weaths = response.css(".tqtongji2 ul[class!='t1'] li::text").getall() #日期 date = response.css(".tqtongji2 ul[class!='t1'] a::text").getall() #最大温度 maxt = weaths[0::5] #最小温度 mint = weaths[1::5] #天气 weath = weaths[2::5] #风向 windd = weaths[3::5] #风力 windp = weaths[4::5] #打印该数据 for date, maxt, mint, weath, windd, windp in zip(date, maxt, mint, weath, windd, windp): print(response.meta['city'],date, maxt, mint, weath, windd, windp)
scrapy数据存储到MariaDB
创建数据库和表
MariaDB [(none)]> create database lishitianqi; Query OK, 1 row affected (0.209 sec) MariaDB [(none)]> use lishitianqi; Database changed MariaDB [lishitianqi]> CREATE TABLE `lishitianqi` ( `id` int(11) NOT NULL AUTO_INCREMENT, `city` varchar(16) NOT NULL, `date` date NOT NULL, `maxt` int(11) NOT NULL, `mint` int(11) NOT NULL, `weath` varchar(20) NOT NULL, `windd` varchar(20) NOT NULL, `windp` varchar(40) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `city` (`city`,`date`) ) ENGINE=InnoDB AUTO_INCREMENT=30673 DEFAULT CHARSET=utf8; Query OK, 0 rows affected (2.030 sec)
编写模型文件items.py
class LishitianqiItem(scrapy.Item): # define the fields for your item here like: city = scrapy.Field() date = scrapy.Field() maxt = scrapy.Field() mint = scrapy.Field() weath = scrapy.Field() windd = scrapy.Field() windp = scrapy.Field()
修改爬虫文件weath.py
1 # -*- coding: utf-8 -*- 2 import scrapy 3 from lishitianqi.items import LishitianqiItem 4 5 class WeathSpider(scrapy.Spider): 6 name = 'weath' 7 allowed_domains = ['lishi.tianqi.com'] 8 start_urls = ['http://lishi.tianqi.com/'] 9 10 def parse(self, response): 11 #选择中国所有城市名及对应url 12 citys = response.css(".bcity a[target='_blank']::text").getall() 13 cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall() 14 15 #将url添加到爬虫中, 16 for city,url in zip(citys[:1],cityurls[:1]): 17 yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city) 18 19 def city(self, response): 20 #选择城市所有年月份及对应url 21 months = response.css(".tqtongji1 a::text").getall() 22 monthurls = response.css(".tqtongji1 a::attr(href)").getall() 23 24 #将url添加到爬虫中 25 for url in monthurls: 26 yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month) 27 28 def month(self, response): 29 #选择该月份中每天 30 for ul in response.css(".tqtongji2 ul[class!='t1']"): 31 #创建item,将数据传递个pipeline 32 item = LishitianqiItem() 33 #获取城市名 34 item['city'] = response.meta['city'] 35 36 a,b,c,d,e,f = ul.css('li') 37 #获取日期 38 item['date'] = a.css("::text").get() 39 #获取日期 40 item['maxt'] = b.css("::text").get() 41 #获取日期 42 item['mint'] = c.css("::text").get() 43 #获取日期 44 item['weath'] = d.css("::text").get() 45 #获取日期 46 item['windd'] = e.css("::text").get() 47 #获取日期 48 item['windp'] = f.css("::text").get() 49 50 yield item
编写项目管道中间件pipeline.py
import pymysql.cursors class LishitianqiPipeline(object): def __init__(self): #连接数据库 self.connect = pymysql.connect( host='localhost', port=3306, db='lishitianqi', user='root', passwd='123', charset='utf8', use_unicode=True) #通过cursor执行增删该查 self.cursor = self.connect.cursor() def process_item(self, item, spider): #插入天气数据 self.cursor.execute( """insert into lishitianqi(city, date, maxt, mint, weath, windd, windp) values (%s, %s, %s, %s, %s, %s, %s)""", (item['city'], item['date'], item['maxt'], item['mint'], item['weath'], item['windd'], item['windp'])) #提交sql语句 self.connect.commit() #必须实现返回 return item
在settings.py中启用mysqlPipeline
ITEM_PIPELINES = { 'lishitianqi.pipelines.LishitianqiPipeline': 300, #格式为:'项目名.文件名.类名':优先级(越小越大) }
爬虫性能优化
首先看一下目标数据有多大,估计要多长时间能抓取完
查看有多少个城市
#scrapy shell https://lishi.tianqi.com/ In [1]: citys = response.css(".bcity a[target='_blank']::text").getall() In [2]: len(citys) Out[2]: 3147
将weath.py的16行代码修改为for city,url in zip(citys[:1],cityurls[:1]):
,运行爬虫,验证爬取一个城市数据耗时
2019-09-09 17:37:52 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
... 2019-09-09 17:41:54 [scrapy.core.engine] INFO: Spider closed (finished)
一个城市耗时4分2s,共需要8.7天才能抓取到所有数据。在我前面写MySQL日志详解时,将二进制日志打开了,这个对性能影响比较大
关掉二进制日志,在重启服务
[mysqld] #log_bin=bin_log_file #binlog_format=row #sync_binlog=1 #server_id=7
查看是否生效
MariaDB [(none)]> show variables like "log_bin"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | OFF | +---------------+-------+ 1 row in set (0.001 sec)
运行爬虫,验证关闭了二进制日志后抓取一个城市数据耗时
2019-09-09 17:50:14 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi) ... 2019-09-09 17:53:01 [scrapy.core.engine] INFO: Spider closed (finished)
耗时2分47秒,这个数据还是有点大,验证不插入数据库纯爬取网页的耗时,将weath.py第50行代码注释掉
#yield item
运行爬虫,验证不写入数据库,抓取一个城市数据耗时
2019-09-09 18:40:07 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi) 2019-09-09 18:40:12 [scrapy.core.engine] INFO: Spider closed (finished)
不写入数据库只需要5s,看来还是要在插入数据库效率上找原因。
在上面的代码实现,是以天为单位调用插入接口,可以按月批量插入来提上效率
修改后的weath.py
1 # -*- coding: utf-8 -*- 2 import scrapy 3 from lishitianqi.items import LishitianqiItem 4 5 class WeathSpider(scrapy.Spider): 6 name = 'weath' 7 allowed_domains = ['lishi.tianqi.com'] 8 start_urls = ['http://lishi.tianqi.com/'] 9 10 def parse(self, response): 11 #选择中国所有城市名及对应url 12 citys = response.css(".bcity a[target='_blank']::text").getall() 13 cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall() 14 15 #将url添加到爬虫中, 16 for city,url in zip(citys[:1],cityurls[:1]): 17 yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city) 18 19 def city(self, response): 20 #选择城市所有年月份及对应url 21 months = response.css(".tqtongji1 a::text").getall() 22 monthurls = response.css(".tqtongji1 a::attr(href)").getall() 23 24 #将url添加到爬虫中 25 for url in monthurls: 26 yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month) 27 28 def month(self, response): 29 data = [] 30 #选择该月份中每天 31 for ul in response.css(".tqtongji2 ul[class!='t1']"): 32 data.append(tuple([response.meta['city']] + [li.css("::text").get() for li in ul.css('li')])) 33 34 #创建item,将数据传递个pipeline 35 item = LishitianqiItem() 36 item['data'] = data 37 yield item
修改后的items.py
1 # -*- coding: utf-8 -*- 2 3 # Define here the models for your scraped items 4 # 5 # See documentation in: 6 # https://docs.scrapy.org/en/latest/topics/items.html 7 8 import scrapy 9 10 class LishitianqiItem(scrapy.Item): 11 # define the fields for your item here like: 12 city = scrapy.Field() 13 date = scrapy.Field() 14 maxt = scrapy.Field() 15 mint = scrapy.Field() 16 weath = scrapy.Field() 17 windd = scrapy.Field() 18 windp = scrapy.Field() 19 data = scrapy.Field()
修改后的pipelines.py
1 # -*- coding: utf-8 -*- 2 3 # Define your item pipelines here 4 # 5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting 6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html 7 8 import pymysql.cursors 9 10 class LishitianqiPipeline(object): 11 12 def __init__(self): 13 #连接数据库 14 self.connect = pymysql.connect( 15 host='localhost', 16 port=3306, 17 db='lishitianqi', 18 user='root', 19 passwd='123', 20 charset='utf8', 21 use_unicode=True) 22 23 #通过cursor执行增删该查 24 self.cursor = self.connect.cursor() 25 26 def process_item(self, item, spider): 27 28 #插入天气数据 29 self.cursor.executemany( 30 """insert ignore into lishitianqi(city, date, maxt, mint, weath, windd, windp) 31 values (%s, %s, %s, %s, %s, %s, %s)""", item['data']) 32 33 #提交sql语句 34 self.connect.commit() 35 36 #必须实现返回 37 return item
抓取一个城市数据耗时
2019-09-09 19:30:45 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi) 2019-09-09 19:30:56 [scrapy.core.engine] INFO: Spider closed (finished)
耗时11s,抓取所有数据共计9.6小时,这个时间可以接受了,睡个觉第二天就好了
数据可靠性优化及修复
从网站抓取来的数据并非可靠,在前面代码中或许你们发现了,在抓取每天天气数据只使用了一次css选择,然后切片分出各种类型的天气,
但是这种方式严格依赖数据格式,如果少了一个数据就会导致数据整体偏移,切片出来的结果混乱,后面我多次css选择解决了这个问题。
除了上面描述的一天中某种类型天气数据缺失,还可能有某天数据缺失,比如:
MariaDB [lishitianqi]> select * from lishitianqi where city="苏右旗" and date like "2011-01%"; +-------+--------+------------+------+------+-----------+---------------+-------------+ | id | city | date | maxt | mint | weath | windd | windp | +-------+--------+------------+------+------+-----------+---------------+-------------+ | 20905 | 苏右旗 | 2011-01-01 | -13 | -23 | 晴 | 东北风~东风 | 4-5级 | | 20906 | 苏右旗 | 2011-01-02 | -12 | -23 | 多云 | 东风~西南风 | 5-6级~4-5级 | | 20907 | 苏右旗 | 2011-01-03 | -13 | -24 | 多云 | 西南风~西北风 | 4-5级~5-6级 | | 20908 | 苏右旗 | 2011-01-04 | -15 | -25 | 小雪~多云 | 西北风 | 5-6级~4-5级 | | 20909 | 苏右旗 | 2011-01-05 | -17 | -24 | 多云 | 西北风 | 5-6级~4-5级 | | 20910 | 苏右旗 | 2011-01-06 | -15 | -22 | 晴 | 西北风~西风 | 5-6级~4-5级 | | 20911 | 苏右旗 | 2011-01-07 | -16 | -24 | 晴 | 西北风 | 5-6级~4-5级 | | 20912 | 苏右旗 | 2011-01-08 | -18 | -27 | 晴 | 西风 | 5-6级~4-5级 | | 20913 | 苏右旗 | 2011-01-09 | -12 | -23 | 多云 | 西风 | 5-6级~4-5级 | | 20914 | 苏右旗 | 2011-01-10 | -16 | -23 | 晴 | 西风 | 5-6级~4-5级 | | 20915 | 苏右旗 | 2011-01-11 | -16 | -28 | 晴~多云 | 西风 | 4-5级 | | 20916 | 苏右旗 | 2011-01-12 | -17 | -27 | 小雪~多云 | 西北风 | 5-6级~4-5级 | | 20917 | 苏右旗 | 2011-01-13 | -18 | -26 | 多云 | 西北风 | 4-5级~5-6级 | | 20918 | 苏右旗 | 2011-01-14 | -14 | -28 | 晴 | 北风~东北风 | 4-5级 | | 20919 | 苏右旗 | 2011-01-15 | -15 | -25 | 多云~晴 | 西北风~北风 | 5-6级~4-5级 | | 20920 | 苏右旗 | 2011-01-16 | -18 | -30 | 小雪~多云 | 西北风~西风 | 5-6级~4-5级 | | 20921 | 苏右旗 | 2011-01-26 | -12 | -26 | 晴 | 西北风 | 5-6级 | | 20922 | 苏右旗 | 2011-01-29 | -14 | -25 | 多云~晴 | 西北风 | 4-5级~5-6级 | | 20923 | 苏右旗 | 2011-01-30 | -10 | -19 | 晴 | 西北风 | 4-5级~5-6级 | | 20924 | 苏右旗 | 2011-01-31 | -6 | -19 | 晴 | 西风 | 4-5级~5-6级 | +-------+--------+------------+------+------+-----------+---------------+-------------+ 20 rows in set (0.002 sec)
其中缺失了17日-25日数据,我的解决方案是:若有缺失数据,尝试从备用网站获取,将数据补全,若备用网站没有该城市或同样缺失该数据的按缺失区间取平均值补全。
备用网站tianqihoubao.com,创建一个数据修复蜘蛛
scrapy genspider recovery tianqihoubao.com
将https://lishi.tianqi.com/支持查询的所有城市列表存入citys_primary.txt
#scrapy shell https://lishi.tianqi.com/ ... In [1]: citys = response.css(".bcity a[target='_blank']::text").getall() In [2]: f = open("citys_primary.txt","+w",encoding="utf-8") In [3]: f.write(str(citys)) Out[3]: 19326 In [4]: f.close()
将http://www.tianqihoubao.com/lishi/支持查询的所有城市列表存入citys_backup.txt
scrapy shell http://www.tianqihoubao.com/lishi/ ... In [9]: text = response.body.decode("GB2312") In [10]: from scrapy.selector import Selector In [11]: sel = Selector(text=text) In [12]: citys = sel.css(".citychk dd a::text").getall()
In [13]: citys = [city.strip(' ') for city in citys] In [14]: f = open("citys_backup.txt","+w",encoding="utf-8") In [15]: f.write(str(citys)) Out[15]: 2601 In [16]: f.close()
爬虫文件recovery.py
1 # -*- coding: utf-8 -*- 2 import scrapy 3 import re 4 from lishitianqi.items import LishitianqiItem 5 6 class RecoverySpider(scrapy.Spider): 7 name = 'recovery' 8 allowed_domains = ['tianqihoubao.com'] 9 10 def parse(self, response): 11 tableTitle = response.css(".wdetail h1::text").get() 12 #获取城市名,为了防止出现空列表导致KeyError故用join,''.join([]) == '' 13 city = ''.join(re.findall("([u4E00-u9FA5]+)历史天气预报",tableTitle)) 14 15 data = [] 16 for tr in response.css(".wdetail table tr")[1:]: 17 tds = tr.css("td") 18 #获取日期 19 date = tds[0].css("a::text").get() 20 dl = re.findall("(d{4})年(d{2})月(d{2})日",date) 21 date = '-'.join(dl[0]) if dl!= [] else '1990-00-00' 22 23 #获取天气 24 weath = tds[1].css("::text").get() 25 wl = re.findall("[u4E00-u9FA5]+", weath) 26 weath = '~'.join(wl) if len(wl) == 2 else '未知~未知' 27 28 #获取最低、最高气温 29 tem = tds[2].css("::text").get() 30 tl = re.findall("(d+|-d+)℃", tem) 31 maxt,mint = tl if len(tl) == 2 else [0,0] 32 33 #获取风向,风速度 34 win = tds[3].css("::text").get() 35 wil = re.findall("[u4E00-u9FA50-9-u2265u2264]+", win) 36 wil = [re.sub('[≤]','小于等于', w) for w in wil] 37 wil = [re.sub('[≥]','大于等于', w) for w in wil] 38 windd,windp = ('~'.join(wil[0::2]),'~'.join(wil[1::2])) if len(wil) == 4 else ("未知","未知") 39 40 data.append([city, date, maxt, mint, weath, windd, windp]) 41 42 #创建item,将数据传递个pipeline 43 item = LishitianqiItem() 44 item['data'] = data 45 yield item
修改pipelines.py文件
1 # -*- coding: utf-8 -*- 2 3 # Define your item pipelines here 4 # 5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting 6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html 7 8 import pymysql.cursors 9 10 #用于获取某年某月有多少天, 11 import calendar 12 13 #用于获取汉字的拼音, pip install xpinyin 14 from xpinyin import Pinyin 15 16 17 class LishitianqiPipeline(object): 18 19 def __init__(self): 20 #连接数据库 21 self.connect = pymysql.connect( 22 host='localhost', 23 port=3306, 24 db='lishitianqi', 25 user='root', 26 passwd='123', 27 charset='utf8', 28 use_unicode=True) 29 30 #通过cursor执行增删该查 31 self.cursor = self.connect.cursor() 32 33 def process_item(self, item, spider): 34 35 #插入天气数据 36 self.cursor.executemany( 37 """insert ignore into lishitianqi(city, date, maxt, mint, weath, windd, windp) 38 values (%s, %s, %s, %s, %s, %s, %s)""", item['data']) 39 40 #提交sql语句 41 self.connect.commit() 42 43 #必须实现返回 44 return item 45 46 #在爬虫运行前执行这个函数 47 def open_spider(self, spider): 48 49 if spider.name == 'recovery' and True: 50 #读取https://lishi.tianqi.com/和http://www.tianqihoubao.com/lishi/支持查询的所有城市列表存, eval将字符串变为python语句, 转换为set用于取交集 51 citys_primary = set(eval(open("citys_primary.txt","r",encoding="utf-8").read())) 52 citys_backup = set(eval(open("citys_backup.txt","r",encoding="utf-8").read())) 53 startyear = 2011 54 endyear = 2019 55 endmonth = 9 56 pin = Pinyin() 57 58 def check_or_appendurl(city,year,month): 59 #计算year年month月有多少天 60 stmr = calendar.monthrange(year, month)[1] 61 #通过sql获取city城市year年month月有多少天 62 sql = "select count(city) from lishitianqi where city='%s' and date like '%d-%02d%%'" % (city,year,month) 63 self.cursor.execute(sql) 64 apmr = self.cursor.fetchall()[0][0] 65 66 #若数据库中某城市某月的天数不等于该月的实际天数,说明这个城市这个月的数据有丢失,需要构造url到备用网站获取天气数据 67 if stmr != apmr: 68 url = "http://www.tianqihoubao.com/lishi/%s/month/%d%02d.html" % (pin.get_pinyin(city, ''),year,month) 69 print(stmr,apmr,'append url: %s' % url) 70 spider.start_urls.append(url) 71 72 #citys_primary为在https://lishi.tianqi.com/支持查询的所有城市列表存,citys_backup为http://www.tianqihoubao.com/lishi/支持查询的所有城市列表 73 #理想情况下爬虫https://lishi.tianqi.com/就能获取到所有需要的数据,但是从网站抓取来的数据并非可靠,有可能某个月的某几天数据丢失, 74 #所有需要到备用网站将数据库缺失数据补齐 75 #备用网站数据集与主用网站数据集不重合,所以在做数据修复的时候要取交集 76 for city in citys_primary & citys_backup: 77 for year in range(startyear,endyear): 78 for month in range(1,13): 79 check_or_appendurl(city,year,month) 80 81 for month in range(1,endmonth): 82 check_or_appendurl(city,endyear,month) 83 84 85 def close_spider(self, spider): 86 pass
数据实时更新
每个月都会产生新的数据,如果重新运行前面较本过于耗时,需要创建一个更新数据的蜘蛛
scrapy genspider update lishi.tianqi.com
...
关于乱码的终极解决方案
如果发现某个页面出现乱码,用scrapy shell登入该页面,respons.text为已经解码为字符串,由于解码格式不对会显示为乱码,response.body为未解码的字节流。
python库chardet可以根据字节流预测该字节流的编码格式,给定字节流越长,判断越准确
pip install chardet
预测response.body编码格式
In [53]: import chardet
In [54]: chardet.detect(response.body) Out[54]: {'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}
GB2312编码的概率为99%,将response.body用GB2312格式解码,就可以看到正确的结果了。
In [63]: text = response.body.decode("GB2312") In [64]: from scrapy.selector import Selector In [65]: sel = Selector(text=text) In [66]: sel.css("h1::text").get() Out[66]: '全国历史天气查询'
django实现单页应用
需求分析
网站需要支持如下可视化功能(city
代表城市,citys
代表城市集,year
代表年份,years
代表年份集,month代表月份。这些字段都为输入项):
- citys-years月平均气温(曲线图)
- city-year各天气比率(饼图)
- citys-years-month日平均气温(曲线图)
- city-year-month各天气比率(饼图)
效果预览:pengliangyuan.club
创建django工程和app
创建django项目
django-admin startproject webapps
创建webapp
cd webapps python .manage.py startapp history_weath
django使用mysql并引入现有数据库
在前面蜘蛛已经将databaselishitianqi
,tablelishitianqi
插入了数据,现在将这些数据引入到django中(根据数据库表生成django模型)
修改settings.py文件
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': 'lishitianqi', #数据库的名字 'USER': 'root', 'PASSWORD':'123', 'HOST':'localhost', 'PORT':'3306', } }
修改__init__.py
import pymysql pymysql.install_as_MySQLdb()
执行python manage.py inspectdb
命令,根据mysql数据表生成django模型
#python manage.py inspectdb ... class Lishitianqi(models.Model): city = models.CharField(max_length=16) date = models.DateField() maxt = models.IntegerField() mint = models.IntegerField() weath = models.CharField(max_length=20) windd = models.CharField(max_length=20) windp = models.CharField(max_length=40, blank=True, null=True) class Meta: managed = False db_table = 'lishitianqi' unique_together = (('city', 'date'),)
将上述模型copy到app的模型文件中,运行python manage.py makemigrations
创建这些更改的迁移
#python manage.py makemigrations history_weath Migrations for 'history_weath': history_weathmigrations 001_initial.py - Create model Lishitianqi
运行python manage.py migrate
,将这些更改应用到数据库
#python manage.py migrate ... Operations to perform: Apply all migrations: admin, auth, contenttypes, history_weath, sessions Running migrations: Applying history_weath.0001_initial... OK
创建城市数据库表,并标记热门城市
创建数据表citys
,用于表示该webapp支持的所有城市,以及热门城市
CREATE TABLE `citys` ( `city` varchar(16) NOT NULL, `hot` tinyint(1) DEFAULT 0, PRIMARY KEY (`city`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
在lishitianqi
中查找大所有城市,插入到citys
表中,字段distinct
用于去重
insert into citys(city) select distinct city from lishitianqi;
添加热门城市
update citys set hot=1 where city in ("三亚","上海","乌鲁木齐","北京","南京","厦门","合肥","天津","广州","成都","拉萨","昆明","杭州","武汉","济南","深圳","西安","贵阳","郑州"," 重庆","长沙","青岛");
生成django模型,粘贴到history_weath/models.py,并执行迁移命令(操作方式同一小节)
路由
history_weath作为一个django app,访问该app的url格式是这样的 DomainName/history_weath/xxx,
编辑webapps/urls.py
from django.contrib import admin from django.urls import path,include from history_weath.views import index urlpatterns = [ path('', index) #将DomainName重定向到DomainName/history_weath,见history_weath/views.py中的index函数(后面小节会附上) path('history_weath/', include('history_weath.urls')), #引用history_weath/urls.py,访问url = 引用url + 被引用url path('admin/', admin.site.urls),, ]
新建history_weath/urls.py
from django.urls import path from . import views app_name = 'history_weath' urlpatterns = [ path('', views.home, name='home'), #history_weath这个app的主页 path('temperature_year', views.temperature_year), #url: DomainName/history_weath/temperature_year, 用于ajax(POST)请求citys,years的月平均气温 path('temperature_month', views.temperature_month), #url: DomainName/history_weath/temprature_month, 用于ajax(POST)请求citys,years,month日平均气温 path('weath_year_ratio', views.weath_year_ratio), #url: DomainName/history_weath/weath_year_ratio, 用于ajax(POST)请求city,year天气比率 path('weath_month_ratio', views.weath_month_ratio), #url: DomainName/history_weath/weath_month_ration, 用于ajax(POST)请求city,year,month天气比率 ]
视图
history_weath为单页应用,index用于网站根url到app主页的重定向,home为单页应用的主页,其他的视图个为只专门处理ajax(POST)请求的。
history_weath/views.py
from django import views from django.http import HttpResponse from django.shortcuts import render, redirect,reverse from .models import Lishitianqi, Citys import django.utils.timezone as timezone from xpinyin import Pinyin import string import json from django.http import Http404 from django.db.models import Avg #将DomainName重定向到DomainName/history_weath def index(request): return redirect(reverse("history_weath:home")) #history_weath这个app的主页 def home(request): hot_citys = Citys.objects.filter(hot=1) #汉字转换拼音 pin = Pinyin() #生成{'A':[], 'B':[], 'C':[] ...}, 为什么用[[] for i in range(26)]而不用[] * 26, 因为前者是浅拷贝,后者是引用 citys = dict(zip(list(string.ascii_uppercase),[[] for i in range(26)])) #将城市名转换为拼音,提取首字母,按不同首字母插入到citys中 for item in Citys.objects.all(): city = item.city first_letter = pin.get_pinyin(city)[:1].upper() citys[first_letter].append(city) #生成2011至今的年份列表 years = [year for year in range(2011,timezone.now().year + 1)] context = { 'hot_citys':hot_citys, 'citys':citys, 'years':years } #将这些数据放入到django模板渲染后显示给用户 return render(request, 'history_weath/home.html', {'context':context}) #url: DomainName/history_weath/temperature_year, 用于ajax(POST)请求citys,years的月平均气温 def temperature_year(request): post_data = json.loads(request.body) citys = post_data.get('citys',[]) years = post_data.get('years',[]) print("REQUEST:",citys,years) if citys == [] or years == []: raise Http404("Question does not exist") else: result = [] for city in citys: for year in years: city_data = {'name':'%s(%s年)'%(city,year), 'data':[]} for month in range(1,13): #分别计算一个月的每天最高温度平均温度,一个月的每天最低温度平均温度 month_avg = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month).aggregate(Avg('maxt'),Avg('mint')) #在将一个月的每天最高温度平均温度,一个月的每天最低温度平均温度求平均 mavg = round((month_avg['maxt__avg'] + month_avg['mint__avg'])/2) if month_avg['maxt__avg'] is not None else 0 city_data['data'].append(mavg) result.append(city_data) #该曲线图参考https://jshare.com.cn/demos/hhhhxu,其中series.data即为ajax请求返回的数据 print("RESPONSE:",result) return HttpResponse(json.dumps(result)) #url: DomainName/history_weath/temprature_month, 用于ajax(POST)请求citys,years,month日平均气温 def temperature_month(request): post_data = json.loads(request.body) citys = post_data.get('citys',[]) years = post_data.get('years',[]) month = post_data.get('month',"") print("REQUEST",citys, years, month) if citys == [] or years == [] or month == "": raise Http404("Question does not exist") else: result = [] for city in citys: for year in years: month_obj = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month) #在这里,日平均温度 = 日最高温度 + 日最低温度 / 2 month_data = [round((day.maxt + day.mint)/2,1) for day in month_obj] city_data = {'name':'%s(%s年%s月)'%(city,year,month), 'data':month_data} result.append(city_data) #该曲线图参考https://jshare.com.cn/demos/hhhhxu,其中series.data即为ajax请求返回的数据 print("RESPONSE",result) return HttpResponse(json.dumps(result)) #url: DomainName/history_weath/weath_year_ratio, 用于ajax(POST)请求city,year天气比率 def weath_year_ratio(request): post_data = json.loads(request.body) city = post_data.get('city',"") year = post_data.get('year',"") print("REQUEST",city, year) if city == "" or year == "": raise Http404("Question does not exist") else: weath_year = Lishitianqi.objects.filter(city=city,date__year=year) weath_cnts = [d.weath for d in weath_year] weath_mods = set(weath_cnts) data=[] for wd in weath_mods: cnt = weath_cnts.count(wd) data.append({ 'name':wd, #天气类型,如晴,多云,多云转晴,小雨 'y': cnt / len(weath_cnts) #该类天气在一年天气中所占比例 }) #按天气比例排序 data.sort(key=lambda x:x['y']) #将比例最高天气从饼中切出 data[-1]['sliced'] = 1 #选择比例最高的天气 data[-1]['selected'] = 1 #该饼图参考https://jshare.com.cn/demos/hhhhDX,其中series.data即为ajax请求返回的数据 result = {'name':'%s(%s年)'%(city,year), 'data':data} print("RESPONSE",result) return HttpResponse(json.dumps(result)) #url: DomainName/history_weath/weath_month_ration, 用于ajax(POST)请求city,year,month天气比率 def weath_month_ratio(request): post_data = json.loads(request.body) city = post_data.get('city',"") year = post_data.get('year',"") month = post_data.get('month',"") print("REQUEST",city, year,month) if city == "" or year == "" or month == "": raise Http404("Question does not exist") else: weath_year = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month) weath_cnts = [d.weath for d in weath_year] weath_mods = set(weath_cnts) data=[] for wd in weath_mods: cnt = weath_cnts.count(wd) data.append({ 'name':wd, #天气类型,如晴,多云,多云转晴,小雨 'y': cnt / len(weath_cnts) #该类天气在一个月天气中所占比例 }) #按天气比例排序 data.sort(key=lambda x:x['y']) #将比例最高天气从饼中切出 data[-1]['sliced'] = 1 #选择比例最高的天气 data[-1]['selected'] = 1 #该饼图参考https://jshare.com.cn/demos/hhhhDX,其中series.data即为ajax请求返回的数据 result = {'name':'%s(%s年%s月)'%(city,year,month), 'data':data} print("RESPONSE",result) return HttpResponse(json.dumps(result))
模板
创建目录
history_weath emplateshistory_weath
由于是单页应用,只有home.html。
home.html
load staticfiles
<!doctype html> <html lang="en"> <head> <script src="static 'history_weath/jquery.min.js'
"></script> <link href="static 'history_weath/bootstrap.min.css'
" rel="stylesheet"> <link href="static 'history_weath/home.css'
" rel="stylesheet"> </head> <body> <nav class="navbar navbar-dark fixed-top bg-dark flex-md-nowrap p-0 shadow"> <a class="navbar-brand col-sm-3 col-md-2 mr-0" href="url 'history_weath:home'
">全国各城市天气可视化系统</a> </nav> <div class="container-fluid"> <div class="row"> <nav class="col-md-2 d-none d-md-block bg-light sidebar"> <div class="sidebar-sticky"> <ul class="nav flex-column"> <li class="nav-item "> <a class="active nav-link" id="page1_index" href="#"> <span data-feather="bar-chart-2"></span> 数据可视化 </a> </li> <li class="nav-item "> <a class="nav-link" href="#" id="page2_index"> <span data-feather="layers"></span> 数据分析 </a> </li> </ul> </div> </nav> <main role="main" class="col-md-9 ml-sm-auto col-lg-10 px-4" style="padding-top:70px"> <div id="page1"> <div clsss="container"> <div class="row"> <div class="col-md-3"> <div class="input-group mb-3"> <div class="input-group-prepend" id="page1_iter"> <button type="button" class="btn btn-outline-secondary" id="page2_city">城市</button> <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> <span class="sr-only">Toggle Dropdown</span> </button> <div class="dropdown-menu">for city in context.hot_citys
<button class="dropdown-item page1_add_city">{{ city.city }}</button>endfor
for k,v in context.citys.items
<div role="separator" class="dropdown-divider"></div><p class="text-center">{{ k }}</p>for city in v
<button class="dropdown-item page1_add_city">{{ city }}</button>endfor
endfor
</div> </div> <input type="text" class="form-control" aria-label="Text input with segmented dropdown button"> </div> <div class="input-group mb-3"> <div class="input-group-prepend" id="page1_iter"> <button type="button" class="btn btn-outline-secondary" id="page1_year">年份</button> <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> <span class="sr-only">Toggle Dropdown</span> </button> <div class="dropdown-menu">for year in context.years
<button class="dropdown-item page1_add_year">{{ year }}</button>endfor
</div> </div> </div> </div> <div class="col-md-1 "> <div class="card " style='height: 100%; background-color: #e5e5e5'> <span class="badge badge-info page1_city_list" ></span> <div id="page1_city_label"></div> </div> </div> <div class="col-md-1"> <div class="card " style='height: 100%; background-color: #e5e5e5'> <span class="badge badge-success page1_year_list" ></span> <div id="page1_year_label"></div> </div> </div> </div> </div> <button type="button" class="btn btn-secondary btn-lg btn-block" style="margin-top: 10px" id="page1_send">开始</button> <div id="page1_year_content" style="display: none"> <div id="page1_chart1" style="min-400px;height:400px; 100%"></div> <div class="input-group mb-3"> <div class="input-group-prepend" id="page1_iter"> <button type="button" class="btn btn-outline-secondary" id="page1_year_weath">气候</button> <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> <span class="sr-only">Toggle Dropdown</span> </button> <div class="dropdown-menu"> <button class="dropdown-item page1_add_year_weath">关闭</button> <div id="page1_year_weath_label"></div> </div> </div> </div> <div id="page1_chart3" style="min-400px;height:400px; 100%; display: none;"></div> <div class="btn-group mr-2" role="group" aria-label="First group" style=" 100%"> <button type="button" class="btn btn-secondary page1_month" month=1>一月</button> <button type="button" class="btn btn-secondary page1_month" month=2>二月</button> <button type="button" class="btn btn-secondary page1_month" month=3>三月</button> <button type="button" class="btn btn-secondary page1_month" month=4>四月</button> <button type="button" class="btn btn-secondary page1_month" month=5>五月</button> <button type="button" class="btn btn-secondary page1_month" month=6>六月</button> <button type="button" class="btn btn-secondary page1_month" month=7>七月</button> <button type="button" class="btn btn-secondary page1_month" month=8>八月</button> <button type="button" class="btn btn-secondary page1_month" month=9>九月</button> <button type="button" class="btn btn-secondary page1_month" month=10>十月</button> <button type="button" class="btn btn-secondary page1_month" month=11>十一月</button> <button type="button" class="btn btn-secondary page1_month" month=12>十二月</button> </div> <div id="page1_month_content" style="display: none"> <div id="page1_chart2" style="min-400px;height:400px; 100%"></div> <div class="input-group mb-3"> <div class="input-group-prepend" id="page1_iter"> <button type="button" class="btn btn-outline-secondary" id="page1_month_weath">气候</button> <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> <span class="sr-only">Toggle Dropdown</span> </button> <div class="dropdown-menu"> <button class="dropdown-item page1_add_month_weath">关闭</button> <div id="page1_month_weath_label"></div> </div> </div> </div> <div id="page1_chart4" style="min-400px;height:400px; 100%; display: none;"></div> </div> </div> </div> <div id="page2" style="display: none"> </div> </main> </div> </div> <script src="static 'history_weath/highcharts.js'
"></script> <script src="static 'history_weath/exporting.js'
"></script> <script src="static 'history_weath/oldie.js'
"></script> <script src="static 'history_weath/highcharts-zh_CN.js'
"></script> <script src="static 'history_weath/popper.min.js'
"></script> <script src="static 'history_weath/bootstrap.min.js'
" ></script> <script src="static 'history_weath/feather.min.js'
"></script> <script src="static 'history_weath/home.js'
"></script> </body> </html>
静态文件
创建目录
history_weathstatichistory_weath
home.css
body { font-size: .875rem; } .feather { width: 16px; height: 16px; vertical-align: text-bottom; } /* * Sidebar */ .sidebar { position: fixed; top: 0; bottom: 0; left: 0; z-index: 100; /* Behind the navbar */ padding: 48px 0 0; /* Height of navbar */ box-shadow: inset -1px 0 0 rgba(0, 0, 0, .1); } .sidebar-sticky { position: relative; top: 0; height: calc(100vh - 48px); padding-top: .5rem; overflow-x: hidden; overflow-y: auto; /* Scrollable contents if viewport is shorter than content. */ } @supports ((position: -webkit-sticky) or (position: sticky)) { .sidebar-sticky { position: -webkit-sticky; position: sticky; } } .sidebar .nav-link { font-weight: 500; color: #333; } .sidebar .nav-link .feather { margin-right: 4px; color: #999; } .sidebar .nav-link.active { color: #007bff; } .sidebar .nav-link:hover .feather, .sidebar .nav-link.active .feather { color: inherit; } .sidebar-heading { font-size: .75rem; text-transform: uppercase; } /* * Content */ [role="main"] { padding-top: 133px; /* Space for fixed navbar */ } @media (min- 768px) { [role="main"] { padding-top: 48px; /* Space for fixed navbar */ } } /* * Navbar */ .navbar-brand { padding-top: .75rem; padding-bottom: .75rem; font-size: 1rem; background-color: rgba(0, 0, 0, .25); box-shadow: inset -1px 0 0 rgba(0, 0, 0, .25); } .navbar .form-control { padding: .75rem 1rem; border-width: 0; border-radius: 0; } .form-control-dark { color: #fff; background-color: rgba(255, 255, 255, .1); border-color: rgba(255, 255, 255, .1); } .form-control-dark:focus { border-color: transparent; box-shadow: 0 0 0 3px rgba(255, 255, 255, .25); }
home.js
(function () { 'use strict' feather.replace() var page1_chart1 = Highcharts.chart('page1_chart1', { chart: { type: 'line' }, title: { text: '' }, subtitle: { text: '' }, xAxis: { categories: ['一月', '二月', '三月', '四月', '五月', '六月', '七月', '八月', '九月', '十月', '十一月', '十二月'] }, yAxis: { title: { text: '气温 (°C)' } }, plotOptions: { line: { dataLabels: { // 开启数据标签 enabled: true }, // 关闭鼠标跟踪,对应的提示框、点击事件会失效 enableMouseTracking: false } }, series: [] }); var page1_chart2 = Highcharts.chart('page1_chart2', { chart: { type: 'line' }, title: { text: '' }, subtitle: { text: '' }, xAxis: { categories: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] }, yAxis: { title: { text: '气温 (°C)' } }, plotOptions: { line: { dataLabels: { // 开启数据标签 enabled: true }, // 关闭鼠标跟踪,对应的提示框、点击事件会失效 enableMouseTracking: false } }, series: [] }); var page1_chart3 = Highcharts.chart('page1_chart3', { chart: { plotBackgroundColor: null, plotBorderWidth: null, plotShadow: false, type: 'pie' }, title: { text: '' }, tooltip: { pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>' }, plotOptions: { pie: { allowPointSelect: true, cursor: 'pointer', dataLabels: { enabled: true, format: '<b>{point.name}</b>: {point.percentage:.1f} %', style: { color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black' } } } }, series: [] }); var page1_chart4 = Highcharts.chart('page1_chart4', { chart: { plotBackgroundColor: null, plotBorderWidth: null, plotShadow: false, type: 'pie' }, title: { text: '' }, tooltip: { pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>' }, plotOptions: { pie: { allowPointSelect: true, cursor: 'pointer', dataLabels: { enabled: true, format: '<b>{point.name}</b>: {point.percentage:.1f} %', style: { color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black' } } } }, series: [] }); $("#page1_index").click(function(){ $("#page1").show() $("#page2").hide() $("#page2_index").removeClass("active") $("#page1_index").addClass("active") }); $("#page2_index").click(function(){ $("#page2").show() $("#page1").hide() $("#page1_index").removeClass("active") $("#page2_index").addClass("active") }); $(".page1_add_city").click(function(){ if($(".page1_city_list").text().indexOf($(this).text()) == -1){ var insert = $(".page1_city_list").first().clone().text($(this).text()) insert.bind("click", function(){ $(this).remove() }) $("#page1_city_label").before(insert) } }); $(".page1_add_year").click(function(){ if($(".page1_year_list").text().indexOf($(this).text()) == -1){ var insert = $(".page1_year_list").first().clone().text($(this).text()) insert.bind("click", function(){ $(this).remove() }) $("#page1_year_label").before(insert) } }); $(".page1_add_year_weath").first().click(function(){ $("#page1_chart3").hide() $("#page1_year_weath").text("气候") }); $(".page1_add_month_weath").first().click(function(){ $("#page1_chart4").hide() $("#page1_month_weath").text("气候") }); $("#page1_send").click(function(){ var city_list = $(".page1_city_list") var citys = [] for(var i = 1,p = city_list.first().next(); i < city_list.length; i++, p=p.next()) citys[i-1] = p.text() var year_list = $(".page1_year_list") var years = [] for(var i = 1,p = year_list.first().next(); i < year_list.length; i++, p=p.next()) years[i-1] = p.text() if(citys.length > 0 && years.length > 0) { //删除 for(var z = $(".page1_add_year_weath").first().next(); z.text() != ""; z = $(".page1_add_year_weath").first().next()) { z.remove() } for(var i = 0; i < citys.length; i++) { for(var j = 0; j < years.length; j++) { var city = citys[i]; var year = years[j]; var text = city + year + "年各种天气所占比例" //添加 var insert = $(".page1_add_year_weath").first().clone().text(text) insert.bind("click", function(){ $("#page1_chart3").show() $("#page1_year_weath").text($(this).text()) $.ajax({ type:"POST", url:"/history_weath/weath_year_ratio", data:JSON.stringify({city:city, year:year}), success:function(result){ //删除数据 var len = page1_chart3.series.length; for(var i = len - 1; i >= 0; i--){ page1_chart3.series[i].remove(); } //新增数据 var data = JSON.parse(result) page1_chart3.addSeries(data); var title = city + "-" + year +"年各种天气所占比例饼图"; page1_chart3.setTitle({text:title,useHTML:true}) } }); }) $("#page1_year_weath_label").before(insert) } } $.ajax({ type:"POST", url:"/history_weath/temperature_year", data:JSON.stringify({citys:citys, years:years}), success:function(result){ //删除数据 var len = page1_chart1.series.length; for(var i = len - 1; i >= 0; i--){ page1_chart1.series[i].remove(); } //新增数据 var data = JSON.parse(result) for(var i = 0; i < data.length; i++){ page1_chart1.addSeries(data[i]); } var title = citys.toString() + "-" + years.toString() +"年月平均气温曲线图"; page1_chart1.setTitle({text:title,useHTML:true}) $("#page1_year_content").show() } }); } }); $(".page1_month").click(function(){ var city_list = $(".page1_city_list") var citys = [] for(var i = 1,p = city_list.first().next(); i < city_list.length; i++, p=p.next()) citys[i-1] = p.text() var year_list = $(".page1_year_list") var years = [] for(var i = 1,p = year_list.first().next(); i < year_list.length; i++, p=p.next()) years[i-1] = p.text() var month = $(this).attr('month') var month_name = $(this).text() if(citys.length > 0 && years.length > 0) { //删除 for(var z = $(".page1_add_month_weath").first().next(); z.text() != "";z = $(".page1_add_month_weath").first().next()) { z.remove() } for(var i = 0; i < citys.length; i++) { for(var j = 0; j < years.length; j++) { var city = citys[i]; var year = years[j]; var text = city + year + "年" + month + "月各种天气所占比例" //添加 var insert = $(".page1_add_month_weath").first().clone().text(text) insert.bind("click", function(){ $("#page1_chart4").show() $("#page1_month_weath").text($(this).text()) $.ajax({ type:"POST", url:"/history_weath/weath_month_ratio", data:JSON.stringify({city:city, year:year, month:month}), success:function(result){ //删除数据 var len = page1_chart4.series.length; for(var i = len - 1; i >= 0; i--){ page1_chart4.series[i].remove(); } //新增数据 var data = JSON.parse(result) page1_chart4.addSeries(data); var title = city + "-" + year +"年" + month + "月" + "各种天气所占比例饼图"; page1_chart4.setTitle({text:title,useHTML:true}) } }); }) $("#page1_month_weath_label").before(insert) } } $.ajax({ type:"POST", url:"/history_weath/temperature_month", data: JSON.stringify({citys:citys, years:years, month:month}), dataType:'json', success:function(result){ //删除数据 var len = page1_chart2.series.length; for(var i = len - 1; i >= 0; i--){ page1_chart2.series[i].remove(); } //新增数据 var data = result for(var i = 0; i < data.length; i++){ page1_chart2.addSeries(data[i]); } var title = citys.toString() + "-" + years.toString() +"年" + month + "月日平均气温曲线图"; page1_chart2.setTitle({text:title,useHTML:true}) $("#page1_month_content").show() } }); } }); }())
关于django使用mysql报错的问题
错误:django.core.exceptions.ImproperlyConfigured: mysqlclient 1.3.13 or newer is required; you have 0.9.3.
解决办法:修改python库路径下/Lib/site-packages/django/db/backends/mysql的base.py文件,注释掉版本低于1.3.13抛出错误这两行代码
#if version < (1, 3, 13): # raise ImproperlyConfigured('mysqlclient 1.3.13 or newer is required; you have %s.' % Database.__version__)
错误:AttributeError: 'str' object has no attribute 'decode'
解决办法:修改python库路径下/Lib/site-packages/django/db/backends/mysql的operations.py文件,判断query为bytes类型时才执行decode
if query is not None: if isinstance(query,bytes): query = query.decode(errors='replace') return query
django,uWSGI,nginx部署
前面的部分都是在windows上完成的开发,现在需要部署到个人云服务器
mysql数据库备份与还原
现在要将前面爬虫花了9小时爬取的数据从个人电脑迁移到云服务器。
数据库备份
mysqldump -u root -p lishitianqi > d:ackup.sql
将backup.sql拷贝的云服务器,数据库还原
MariaDB [(none)]> create database lishitianqi; ... [root@VM_0_12_centos mysql]# mysql -u root -p --default-character-set=utf8 lishitianqi < backup.sql
django部署清单
检查settings.py
python manage.py check --deploy System check identified some issues: WARNINGS: ?: (security.W003) You don't appear to be using Django's built-in cross-site request forgery protection via the middleware ('django.middleware.csrf.CsrfViewMiddleware' is not in your MIDDLEWARE). Enabling the middleware is the safest approach to ensure you don't leave any holes. ?: (security.W004) You have not set a value for the SECURE_HSTS_SECONDS setting. If your entire site is served only over SSL, you may want to consider setting a value and enabling HTTP Strict Transport Security. Be sure to read the documentation first; enabling HSTS carelessly can cause serious, irreversible problems. ?: (security.W006) Your SECURE_CONTENT_TYPE_NOSNIFF setting is not set to True, so your pages will not be served with an 'x-content-type-options: nosniff' header. You should consider enabling this header to prevent the browser from identifying content types incorrectly. ?: (security.W007) Your SECURE_BROWSER_XSS_FILTER setting is not set to True, so your pages will not be served with an 'x-xss-protection: 1; mode=block' header. You should consider enabling this header to activate the browser's XSS filtering and help prevent XSS attacks. ?: (security.W008) Your SECURE_SSL_REDIRECT setting is not set to True. Unless your site should be available over both SSL and non-SSL connections, you may want to either set this setting True or configure a load balancer or reverse-proxy server to redirect all connections to HTTPS. ?: (security.W012) SESSION_COOKIE_SECURE is not set to True. Using a secure-only session cookie makes it more difficult for network traffic sniffers to hijack user sessions. ?: (security.W018) You should not have DEBUG set to True in deployment. ?: (security.W019) You have 'django.middleware.clickjacking.XFrameOptionsMiddleware' in your MIDDLEWARE, but X_FRAME_OPTIONS is not set to 'DENY'. The default is 'SAMEORIGIN', but unless there is a good reason for your site to serve other parts of itself in a frame, you should change it to 'DENY'. System check identified 8 issues (0 silenced).
按照上述描述的问题一一解决
python虚拟环境
pip install virtualenv pip install virtualenvwrapper #创建目录用来存放虚拟环境 mkdir $HOME/.virtualenvs #在~/.bashrc中添加行 export WORKON_HOME=$HOME/.virtualenvs export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/python3/bin/virtualenv source /usr/local/python3/bin/virtualenvwrapper.sh #让配置文件生效 source ~/.bashrc #现在mkvirtualenv, workon, deactive,rmvirtualenv命令可以用了 #创建虚拟环境 mkvirtualenv -p python3 django_env_py3 #进入虚拟环境 workon django_env_py3 #退出虚拟环境 deactivate
#删除虚拟环境
revirtualenv django_env_py3
工程迁移
进入django虚拟环境
workon django_env_py3
将此工程拷贝到云服务器,安装依赖库,修改连接数据库参数:主机(localhost需要改为127.0.0.1),端口号,用户名,密码,让工程能正常运行。关于连接mysql Error问题上文提到过。
spider/lishitianqi/pipelines.py
#连接数据库 self.connect = pymysql.connect( host='127.0.0.1', port=8809, db='lishitianqi', user='root', passwd='123', charset='utf8', use_unicode=True)
webapps/webapps/settings.py
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': 'lishitianqi', 'USER': 'root', 'PASSWORD':'123', 'HOST':'127.0.0.1', 'PORT':'8809', } }
生成requirements.txt
pip freeze > requirements.txt
asn1crypto==0.24.0 attrs==19.1.0 Automat==0.7.0 cffi==1.12.3 constantly==15.1.0 cryptography==2.7 cssselect==1.1.0 Django==2.2.5 hyperlink==19.0.0 idna==2.8 incremental==17.5.0 lxml==4.4.1 parsel==1.5.2 pyasn1==0.4.7 pyasn1-modules==0.2.6 pycparser==2.19 PyDispatcher==2.0.5 PyHamcrest==1.9.0 PyMySQL==0.9.3 pyOpenSSL==19.0.0 pytz==2019.2 queuelib==1.5.0 Scrapy==1.7.3 service-identity==18.1.0 six==1.12.0 sqlparse==0.3.0 Twisted==19.7.0 uWSGI==2.0.18 w3lib==1.21.0 xpinyin==0.5.6 zope.interface==4.6.0
通过requirements.txt安装python库
pip install -r requirements.txt
uWSIG服务器部署
安装uwsgi
pip install uwsgi
在django项目的setting.py文件的同级目录下,增加一个配置文件uwsgi.ini
[uwsgi] #使用nginx连接时使用 socket=127.0.0.1:9090 #直接做web服务器使用 #http=127.0.0.1:9090 #项目目录 chdir=/root/project/fullstack_history_weath/webapps #项目中wsgi.py文件的目录,相对于项目目录 wsgi-file=webapps/wsgi.py # 指定启动的工作进程数 processes=4 # 指定工作进程中的线程数 threads=2 master=True # 保存启动之后主进程的pid pidfile=uwsgi.pid # 设置uwsgi后台运行,用uwsgi.log保存日志信息 daemonize=uwsgi.log # 设置虚拟环境的路径 virtualenv=/root/.virtualenvs/django_env_py3
uwsgi相关命令
#uWSGI 通过 xxx.ini 启动后会在相同目录下生成一个 xxx.pid 的文件,里面只有一行内容是 uWSGI 的主进程的进程号。 #uWSGI 启动: uwsgi --ini xxx.ini #uwsgi 重启: uwsgi --reload xxx.pid #uwsgi 停止: uwsgi --stop xxx.pid
#强制关闭
pkill uwsgi
nginx服务器部署
安装nginx
wget http://nginx.org/download/nginx-1.8.0.tar.gz tar -xvf nginx-1.8.0.tar.gz cd nginx-1.8.0 ./configure make
make install
ln -s /usr/local/nginx/sbin/nginx /usr/bin/nginx
运行nginx
/usr/local/nginx/sbin/nginx
打开浏览器访问此机器的 IP,如果浏览器出现 Welcome to nginx! 则表示 Nginx 已经安装并运行成功。
nginx相关命令:
#重启: nginx –s reload #停止: nginx –s stop #测试配置文件是否正常: nginx –t #强制关闭: pkill nginx
配置nginx
location / { include uwsgi_params; uwsgi_pass 127.0.0.1:9090; } location /static/ { autoindex on; alias /root/project/fullstack_history_weath/webapps/static/; }
部署出现的问题
问题:uwsgi日志报错“ModuleNotFoundError: No module named 'encodings'”
解决办法:uwsgi配置文件中虚拟环境路径设置错误,可以通过
In [1]: import sys In [2]: sys.path Out[2]: ['/root/.virtualenvs/django_env_py3/lib/python3.6/site-packages', '/usr/local/python3/bin', '/usr/local/python3/lib/python36.zip', '/usr/local/python3/lib/python3.6', '/usr/local/python3/lib/python3.6/lib-dynload', '', '/usr/local/python3/lib/python3.6/site-packages', '/usr/local/python3/lib/python3.6/site-packages/IPython/extensions', '/root/.ipython']
因为我的虚拟环境名为django_env_py3,所以
# 设置虚拟环境的路径 virtualenv=/root/.virtualenvs/django_env_py3
问题:uwsgi日志报错.“--- no python application found, check your startup logs for errors ---”
问题:uwsgi日志报错“chdir(): No such file or directory [core/uwsgi.c line 2623]”
解决办法: 上面两个问题第一个是没有设置chdir,第二个是chdir路径设置错误导致,正确写法如下
#django项目根目录 chdir=/root/project/fullstack_history_weath/webapps
问题:nginx能加载html,加载静态文件显示 404(Not Found)
可能的原因:
1. alias /root/project/fullstack_history_weath/webapps/static
末尾没加/
会导致
2. python setting.py中的STATIC_ROOT
和nginx配置文件中静态文件规则location /static/
上的路径不一致
3. python没有执行命令python manage.py collectstatic
收集所有静态文件到STATIC_ROOT
路径下
问题:nginx能加载html,加载静态文件显示403(Forbidden)
解决方法:静态文件以及他们的所有父目录权限不足,例如有静态文件路径/root/project/fullstack_history_weath/webapps/static/
,若目录project的权限为644,
static中所有的文件权限为644,该路径中的其余文件夹(root, fullstack_history_weath,webapps)权限为655,则需要做如下修改
chmod 655 /root/project chmod -R 655 /root/project/fullstack_history_weath/webapps/static
项目代码获取
至此,整个项目已经全部完成,完整代码获取见下面连接。
nginx文件下载:http://pengliangyuan.club/files/Chinese-historical-weather-visualization-system/
github连接:https://github.com/pengliangyuan/Chinese-historical-weather-visualization-system
番外篇-nginx目录美化
- 给nginx添加fanyindex插件,下载源码ngx-fancyindex
- 将源码解压后放在nginx源码目录下
- 在编译选项中添加这个插件
./configure --add-module=ngx-fancyindex-0.4.3
(插件解压后的目录名) make
,make install
,ln -s /usr/local/nginx/sbin/nginx /usr/bin/nginx
- 在目录
/usr/local/nginx/html
下,下载fancyindex主题git clone https://github.com/TheInsomniac/Nginx-Fancyindex-Theme.git
- 修改
/usr/local/nginx/conf/nginx.conf
,配置如下,nginx -s reload
location /files { include /usr/local/nginx/html/Nginx-Fancyindex-Theme/fancyindex.conf; alias /root/files/; charset utf-8,gbk; } location /Nginx-Fancyindex-Theme/ { alias /usr/local/nginx/html/Nginx-Fancyindex-Theme/; }