zoukankan      html  css  js  c++  java
  • python 爬去拉钩测试招聘信息

    代码如下:

     1 #coding:utf-8
     2 import time
     3 import urllib.request
     4 from bs4 import BeautifulSoup
     5 file=open(r'meituancde.txt','w')
     6 def get_url(i):
     7     url='https://www.lagou.com/zhaopin/ceshi/%s/?filterOption=%s'%(i,i)
     8     return url
     9 def get_html(i):
    10     headers={
    11         'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
    12         }
    13     response=urllib.request.Request(url=get_url(i),headers=headers)
    14     html=urllib.request.urlopen(response).read().decode('utf-8')
    15     sopu=BeautifulSoup(html)
    16     return sopu
    17 def parse(i):
    18     soup=get_html(i)
    19     me=soup.findAll('',{'class':'money'}) #工资
    20     me1=soup.findAll('',{'class':'format-time'})#发布时间
    21     me2=soup.findAll('',{'class':'li_b_r'})#福利
    22     me3=soup.findAll('',{'data-lg-tj-id':'8F00'})#公司名字
    23     meitu={}
    24     i=0
    25     for title in me:
    26         meitu['gongzi'] =me[i].text
    27         for jianjie in me1:
    28             meitu['发布时间']=me1[i].text
    29             for sellum in me2:
    30                 meitu['福利']=me2[i].text
    31                 for pire in me3:
    32                     meitu['公司名称']=me3[i].text
    33         i+=1
    34         print(meitu)
    35         if len(meitu) !=0:
    36             file.write(str(meitu))
    37             file.write("
    ")
    38             file.close
    39 if __name__ == '__main__':
    40     for i in range(1,31):
    41 parse(i)

    结果图:

  • 相关阅读:
    php提示undefined index的几种解决方法
    划分树(poj2104)
    ACM-ICPC 2018 南京赛区网络预赛B
    AC Challenge(状压dp)
    UVALive5966(bfs)
    UVALive
    STL next_permutation 算法原理和实现
    凸包算法
    poj1873(枚举+凸包)
    CodeForces
  • 原文地址:https://www.cnblogs.com/leiziv5/p/6533437.html
Copyright © 2011-2022 走看看