zoukankan      html  css  js  c++  java
  • Python爬取51job招聘网Python工作薪资

    python岗位的薪资情况如何?用python爬虫来收集数据

    一,需求内容

    51job招聘网站的python岗位的薪资情况

     二,准备工具

    1,urllib库

    2,beautifulsoup库

    #导入库
    from
    bs4 import BeautifulSoup from urllib.request import urlopen

    三,分析源码

    <div class="el">
            <p class="t1 ">
                <em class="check" name="delivery_em" onclick="checkboxClick(this)">'l</em>
                <input class="checkbox" type="checkbox" name="delivery_jobid" value="118428015" jt="0" style="display:none" />
                <span>
                    <a target="_blank" title="Python开发工程师实习生" href="https://jobs.51job.com/hefei-ssq/118428015.html?s=01&t=0"  onmousedown="">
                        Python开发工程师实习生                </a>
                </span>
                                                                        </p>
            <span class="t2"><a target="_blank" title="合肥合和信息科技有限公司" href="https://jobs.51job.com/all/co5410918.html">合肥合和信息科技有限公司</a></span>
            <span class="t3">合肥-蜀山区</span>
            <span class="t4">4.5-6千/月</span>
            <span class="t5">11-10</span>
        </div>

    从源码中可以看到所需的内在<p>标签和<span>标签中

    soup = BeautifulSoup(html,"html.parser")
    #通过标签选择
    titles=soup.select("p[class='t1'] a")
    salaries=soup.select("span[class='t4']") # CSS 选择器

     四,输出结果

    for i in range(len(titles)):
        print("{:30}{}".format(titles[i].get('title'),salaries[i+1].get_text()))

     

     五,模拟请求头

    header ={    "Connection": "keep-alive",    "Upgrade-Insecure-Requests": "1",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",    "Accept":" text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",    "Accept-Encoding": "gzip,deflate",    "Accept-Language": "zh-CN,zh;q=0.8"};

    六,完整代码

    from bs4 import BeautifulSoup
    from urllib.request import urlopen
    header ={    "Connection": "keep-alive",    "Upgrade-Insecure-Requests": "1",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",    "Accept":" text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",    "Accept-Encoding": "gzip,deflate",    "Accept-Language": "zh-CN,zh;q=0.8"};
    
    html = urlopen("https://search.51job.com/list/000000,000000,0000,00,9,99,Python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=").read().decode('GBK')
    soup = BeautifulSoup(html,"html.parser")
    titles=soup.select("p[class='t1'] a")
    salaries=soup.select("span[class='t4']") # CSS 选择器
    
    
    for i in range(len(titles)):
        print("{:30}{}".format(titles[i].get('title'),salaries[i+1].get_text()))
  • 相关阅读:
    postman使用
    web应用服务器性能监控及调优
    软件测试的相关网站
    web测试点梳理
    HTTP协议详解
    Fidder详解之get和post请求
    浅谈HTTPS协议
    APP测试基本流程
    Python 拷贝对象(深拷贝deepcopy与浅拷贝copy)
    为学日益,为道日损
  • 原文地址:https://www.cnblogs.com/chenchang-rjgc/p/11830244.html
Copyright © 2011-2022 走看看