虽然老师说让我们做疫情数据视图,但因为没有库所以只能从别的地方找
那么今天我就和我可爱的队友一起研究了通过Python爬取腾讯实时疫情监测网页的数据信息!
首先第一步,我们要打开这个网页看一眼,这个网页实际上是一堆json字符串,只有文字信息,所以我们先要导入第三方的requests工具和解析json字符串的json工具
import requests
import json
之后通过该工具访问页面获取到页面文本(对于这个页面来说就是json字符)
r = requests.get(url, headers) res = json.loads(r.text) data_all = json.loads(res["data"])
再根据获取到的信息构筑键值对
details = [] update_time = data_all["lastUpdateTime"] data_country = data_all["areaTree"] data_province = data_country[0]["children"] for pro_infos in data_province: province = pro_infos["name"] # print(province) for city_infos in pro_infos["children"]: city = city_infos["name"] confirm = city_infos["total"]["confirm"] confirm_add = city_infos["today"]["confirm"] heal = city_infos["total"]["heal"] dead = city_infos["total"]["dead"] suspect = city_infos["total"]["suspect"] details.append([update_time, province, city, confirm, confirm_add, heal, dead, suspect])
接下来,导入数据库交互组件pyMySQL
import pymysql
建立连接与游标
conn = pymysql.connect(host="127.0.0.1", port=3306, user="root", password="260702266", database="virus", charset="utf8") cursor = conn.cursor()
通过一个循环来遍历detail数组,用插入语句导入到数据库里
count = 0 while count < len(details): sql = "insert into newebd(time,province,city,nowConfirm,Confirm,heal,dead,suspect) values(%s,%s,%s,%s,%s,%s," "%s,%s) " count = count+1 try: cursor.executemany(sql, details) conn.commit() except: conn.rollback()
这样就完成了数据爬取到数据库
当然详细的原理我还不是太明白,所以这一阵要多爬爬别人家的网站看看
好,那么今天就先这样!