zoukankan html css js c++ java

py3+requests+json+xlwt，爬取拉勾招聘信息

在拉勾搜索职位时，通过谷歌F12抓取请求信息

发现请求是一个post请求，参数为：

返回的是json数据

有了上面的基础，我们就可以构造请求了

然后对获取到的响应反序列化，这样就获取到了json格式的招聘信息，就可以进行各种操作了，比如取其中的某个信息

最后循环写入excle

具体实现如下：

import requests
import json
import xlwt

items = [] # 招聘信息
pn = 1
# 抓取数据
def get_content(pn):
    # 全国
    url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
    data = {
        'first':'true',
        'pn':pn,
        'kd':'python'
    }
    # url发送一个post请求，把data数据发送过去
    html = requests.post(url,data).text # 获取文本
    # print(type(html)) # <class 'str'>
    html = json.loads(html) 
    print(html)

    for i in range(15):
        item = []
        # 字典嵌套，招聘职位、公司、薪资、地区、福利、提供条件、工作类型
        item.append(html['content']['positionResult']['result'][i]['positionName'])
        item.append(html['content']['positionResult']['result'][i]['companyFullName'])
        item.append(html['content']['positionResult']['result'][i]['salary'])
        item.append(html['content']['positionResult']['result'][i]['city'])
        item.append(html['content']['positionResult']['result'][i]['positionAdvantage'])
        item.append(html['content']['positionResult']['result'][i]['companyLabelList'])
        item.append(html['content']['positionResult']['result'][i]['firstType'])
        items.append(item)
    return items

# 创建excel表格
def excel_write(items):
    newTable = 'test1.xls'  
    wb = xlwt.Workbook(encoding='utf-8') # 创建excel文件
    ws = wb.add_sheet('test1') # 创建sheet
    headData = ['招聘职位','公司','薪资','地区','福利','提供条件','工作类型']
    for hd in range(7):
        ws.write(0,hd,headData[hd],xlwt.easyxf('font:bold on')) 
    # 写数据
    index = 1 # 表示行
    for item in items:
        for i in range(7):
            # print(type(item[i]))
            if i == 5:
                ws.write(index, i, ','.join(item[i])) 
            else:
                ws.write(index, i, item[i])  
        index += 1
    wb.save(newTable)  


if __name__ == '__main__':
    items = get_content(pn)
    print(items)
    excel_write(items)

查看全文

相关阅读:
mysql practice
image update to ubuntu18.04
C++11 new feature
bazel remote executor--- buildfarm( in docker)
python3学习笔记13（数据结构）
python3学习笔记12(变量作用域）
python3学习笔记11(函数）
jmeter 01 之beanshell preprocessor
python3学习笔记10(迭代器和生成器)
python3学习笔记十（循环语句）

原文地址：https://www.cnblogs.com/uncleyong/p/6960044.html