zoukankan      html  css  js  c++  java
  • Python——爬取人口迁徙数据(以腾讯迁徙为例)

    说明:

    1.迁徙量是腾讯修改后的数值,无法确认真实性。

    2.代码运行期间,腾讯迁徙未设置IP屏蔽和浏览器检测,因此下段代码仅能保证发布近期有效。

    3.代码功能:爬取指定一天的四十(此四十是根据自己的城市列表而定,可多可少,并无限制)个城市左右的迁徙量(含迁入、迁出)。

     1 import re
     2 import urllib.request
     3 import xlwt
     4 import xlrd
     5 
     6 date = "20171016"
     7 cityList = xlrd.open_workbook("E:/city.xls").sheet_by_index(0).col_values(0) # ['city', '南昌', '景德镇', '萍乡', ...
     8 cityCodeList = xlrd.open_workbook("E:/city.xls").sheet_by_index(0).col_values(1) # ['cityCode', '360100', '360200',...
     9 direction = ["0","1"]
    10 header = ["from","to","number","car","train","plane"]
    11 dInd = 0
    12 for cityIndex in range(1,len(cityCodeList)):
    13     for dInd in range(2):
    14         url = "https://lbs.gtimg.com/maplbs/qianxi/" + date + "/" + cityCodeList[cityIndex] + direction[dInd] + "6.js" # "0 迁入": result-city,"1 迁出:city-result
    15         workbook = xlwt.Workbook()
    16         sheet = workbook.add_sheet("result")
    17         for i in range(len(header)):
    18             sheet.write(0,i,header[i])
    19         ptRow = re.compile('([".*?])')
    20         ptCity = re.compile("")
    21         try:
    22             data = urllib.request.urlopen(url).read().decode("utf8") # JSONP_LOADER&&JSONP_LOADER([["重庆",198867,0.000,0.300,0.700],["上海",174152,0.160,0.390,0.450],[...
    23             dataList = re.findall(ptRow,data) # ['["重庆",198867,0.000,0.300,0.700]', '["上海",174152,0.160,0.390,0.450]',[...
    24             for i in range(len(dataList)):
    25                 colList = str(dataList[i]).split(",") # colList[4] = 0.700]
    26                 if direction[dInd] == "0":
    27                     sheet.write(i + 1, len(header) - 6, str(colList[0]).replace("[","").replace('"',"")) # city
    28                     sheet.write(i + 1, len(header) - 5, cityList[cityIndex])
    29                 else:
    30                     sheet.write(i + 1, len(header) - 6, cityList[cityIndex])
    31                     sheet.write(i + 1, len(header) - 5, str(colList[0]).replace("[","").replace('"',"")) # city
    32                 sheet.write(i + 1, len(header) - 4, colList[1]) # number
    33                 sheet.write(i + 1, len(header) - 3, colList[2]) # car
    34                 sheet.write(i + 1, len(header) - 2, colList[3]) # train
    35                 sheet.write(i + 1, len(header) - 1, str(colList[4]).replace("]","")) # plane
    36         except Exception as e:
    37             print(e)
    38         workbook.save("E:/qianxi/" + str(cityList[cityIndex]) + direction[dInd] + date + ".xls")
    39 print("Done!")

    结果展示:

  • 相关阅读:
    centos shell脚本编程1 正则 shell脚本结构 read命令 date命令的用法 shell中的逻辑判断 if 判断文件、目录属性 shell数组简单用法 $( ) 和${ } 和$(( )) 与 sh -n sh -x sh -v 第三十五节课
    基于HTML5 WebGL实现 json工控风机叶轮旋转
    基于HTML5的WebGL实现的2D3D迷宫小游戏
    基于HTML5和WebGL的碰撞测试
    基于HTML5和WebGL的3D网络拓扑结构图
    基于 HTML5 WebGL 的 3D 网络拓扑图
    基于HTML5 Canvas 实现弹出框
    基于HTML5 Canvas实现用户交互
    基于HTML5快速搭建TP-LINK电信拓扑设备面板
    HTML5 技术在风电、光伏等新能源领域的应用
  • 原文地址:https://www.cnblogs.com/shadrach/p/7687602.html
Copyright © 2011-2022 走看看