zoukankan      html  css  js  c++  java
  • Java爬取51job保存到MySQL并进行分析

    大二下实训课结业作业,想着就爬个工作信息,原本是要用python的,后面想想就用java试试看,

    java就自学了一个月左右,想要锻炼一下自己面向对象的思想等等的,

    然后网上转了一圈,拉钩什么的是动态生成的网页,51job是静态网页,比较方便,就决定爬51job了。

     参考https://blog.csdn.net/qq_42982169/article/details/83155040,改了许多地方,方便模块化,加了保存的功能

    前提:

    创建Maven Project方便包管理

    使用httpclient 3.1以及jsoup1.8.3作为爬取网页和筛选信息的包,这两个版本用的人多。

    mysql-connect-java 8.0.13用来将数据导入数据库,支持mysql8.0+

    分析使用,tablesaw(可选,会用的就行)

    “大数据+上海”以此URL为例子,只要是类似的URL都可行

    https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=

    先设计了个大概的功能,修改了好几版,最后觉得这样思路比较清晰,以JobBean容器作为所有功能的媒介

    先完成爬取网页,以及保存到本地

    创建JobBean对象

    public class JobBean {
        private String jobName;
        private String company;
        private String address;
        private String salary;
        private String date;
        private String jobURL;
        
        public JobBean(String jobName, String company, String address, String salary, String date, String jobURL) {
            this.jobName = jobName;
            this.company = company;
            this.address = address;
            this.salary = salary;
            this.date = date;
            this.jobURL = jobURL;
        }
        
        
        
        @Override
        public String toString() {
            return "jobName=" + jobName + ", company=" + company + ", address=" + address + ", salary=" + salary
                    + ", date=" + date + ", jobURL=" + jobURL;
        }
    
    
    
        public String getJobName() {
            return jobName;
        }
        public void setJobName(String jobName) {
            this.jobName = jobName;
        }
        public String getCompany() {
            return company;
        }
        public void setCompany(String company) {
            this.company = company;
        }
        public String getAddress() {
            return address;
        }
        public void setAddress(String address) {
            this.address = address;
        }
        public String getSalary() {
            return salary;
        }
        public void setSalary(String salary) {
            this.salary = salary;
        }
        public String getDate() {
            return date;
        }
        public void setDate(String date) {
            this.date = date;
        }
        public String getJobURL() {
            return jobURL;
        }
        public void setJobURL(String jobURL) {
            this.jobURL = jobURL;
        }
    }

    然后写一个用于保存容器的工具类,这样在任何阶段都可以保存容器

    import java.io.*;
    import java.util.*;
    
    /**实现
     * 1。将JobBean容器存入本地
     * 2.从本地文件读入文件为JobBean容器(有筛选)
     * @author PowerZZJ
     *
     */
    public class JobBeanUtils {
        
        /**保存JobBean到本地功能实现
         * @param job
         */
        public static void saveJobBean(JobBean job) {
            try(BufferedWriter bw =
                    new BufferedWriter(
                            new FileWriter("JobInfo.txt",true))){
                String jobInfo = job.toString();
                bw.write(jobInfo);
                bw.newLine();
                bw.flush();
            }catch(Exception e) {
                System.out.println("保存JobBean失败");
                e.printStackTrace();
            }
        }
        
        /**保存JobBean容器到本地功能实现
         * @param jobBeanList JobBean容器
         */
        public static void saveJobBeanList(List<JobBean> jobBeanList) {
            System.out.println("正在备份容器到本地");
            for(JobBean jobBean : jobBeanList) {
                saveJobBean(jobBean);
            }
            System.out.println("备份完成,一共"+jobBeanList.size()+"条信息");
        }
        
        /**从本地文件读入文件为JobBean容器(有筛选)
         * @return jobBean容器
         */
        public static List<JobBean> loadJobBeanList(){
            List<JobBean> jobBeanList = new ArrayList<>();
            try(BufferedReader br = 
                    new BufferedReader(
                            new FileReader("JobInfo.txt"))){
                String str = null;
                while((str=br.readLine())!=null) {
                    //筛选,有些公司名字带有","不规范,直接跳过
                    try {
                        String[] datas = str.split(","); 
                        String jobName = datas[0].substring(8);
                        String company = datas[1].substring(9);
                        String address = datas[2].substring(9);
                        String salary = datas[3].substring(8);
                        String date = datas[4].substring(6);
                        String jobURL = datas[5].substring(8);
                        //筛选,全部都不为空,工资是个区间,URL以https开头,才建立JobBean
                        if (jobName.equals("") || company.equals("") || address.equals("") || salary.equals("")
                                || !(salary.contains("-"))|| date.equals("") || !(jobURL.startsWith("http")))
                            continue;
                        JobBean jobBean = new JobBean(jobName, company, address, salary, date, jobURL);
                        //放入容器
                        jobBeanList.add(jobBean);
                    }catch(Exception e) {
                        System.out.println("本地读取筛选:有问题需要跳过的数据行:"+str);
                        continue;
                    }
                }
                System.out.println("读取完成,一共读取"+jobBeanList.size()+"条信息");
                return jobBeanList;
            }catch(Exception e) {
                System.out.println("读取JobBean失败");
                e.printStackTrace();
            }
            return jobBeanList;
        }
    }

    接着就是关键的爬取了

    标签是el 里面是需要的信息,以及第一个el出来的是总体信息,一会需要去除。

    各自里面都有t1,t2,t3,t4,t5标签,按照顺序一个个取出来就好。

    再查看"下一页"元素,在bk标签下,这里要注意,有两个bk,第一个bk是上一页,第二个bk才是下一页,

    之前我爬取进入死循环了。。。。

    最后一个spider功能把爬取信息以及迭代下一页全部都放在一起

    import java.net.URL;
    import java.util.ArrayList;
    import java.util.List;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    
    /**爬取网页信息
     * @author PowerZZJ
     *
     */
    public class Spider {
        //记录爬到第几页
        private static int pageCount = 1;
        
        private String strURL;
        private String nextPageURL;
        private Document document;//网页全部信息
        private List<JobBean> jobBeanList;
        
        public Spider(String strURL) {
            this.strURL = strURL;
            nextPageURL = strURL;//下一页URL初始化为当前,方便遍历
            jobBeanList = new ArrayList<JobBean>();
            
        }
        
        /**获取网页全部信息
         * @param 网址
         * @return 网页全部信息
         */
        public Document getDom(String strURL) {
            try {
                URL url = new URL(strURL);
                //解析,并设置超时
                document = Jsoup.parse(url, 4000);
                return document;
            }catch(Exception e) {
                System.out.println("getDom失败");
                e.printStackTrace();
            }
            return null;
        }
        
    
        /**筛选当前网页信息,转成JobBean对象,存入容器
         * @param document 网页全部信息
         */
        public void getPageInfo(Document document) {
            //通过CSS选择器用#resultList .el获取el标签信息
            Elements elements = document.select("#resultList .el");
            //总体信息删去
            elements.remove(0);
            //筛选信息
            for(Element element: elements) {
                Elements elementsSpan = element.select("span");
                String jobURL = elementsSpan.select("a").attr("href");
                String jobName = elementsSpan.get(0).select("a").attr("title");
                String company = elementsSpan.get(1).select("a").attr("title");
                String address = elementsSpan.get(2).text();
                String salary = elementsSpan.get(3).text();
                String date = elementsSpan.get(4).text();
                //建立JobBean对象
                JobBean jobBean = new JobBean(jobName, company, address, salary, date, jobURL);
                //放入容器
                jobBeanList.add(jobBean);
            }
        }
        
        /**获取下一页的URL
         * @param document 网页全部信息
         * @return 有,则返回URL
         */
        public String getNextPageURL(Document document) {
            try {
                Elements elements = document.select(".bk");
                //第二个bk才是下一页
                Element element = elements.get(1);
                nextPageURL = element.select("a").attr("href");
                if(nextPageURL != null) {
                    System.out.println("---------"+(pageCount++)+"--------");
                    return nextPageURL;
                }
            }catch(Exception e) {
                System.out.println("获取下一页URL失败");
                e.printStackTrace();
            }
            return null;
        }
        
        
        /**开始爬取
         * 
         */
        public void spider() {
            while(!nextPageURL.equals("")) {
                //获取全部信息
                document = getDom(nextPageURL);
                //把相关信息加入容器
                getPageInfo(document);
                //查找下一页的URL
                nextPageURL = getNextPageURL(document);
            }
        }
        
        //获取JobBean容器
        public List<JobBean> getJobBeanList() {
            return jobBeanList;
        }
    }

     然后测试一下爬取与保存功能

    import java.util.ArrayList;
    import java.util.List;
    
    public class Test1 {
        public static void main(String[] args) {
            List<JobBean> jobBeanList = new ArrayList<>();
            //大数据+上海
            String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";        
    
            //测试Spider以及保存
            Spider spider = new Spider(strURL);
            spider.spider();
            //获取爬取后的JobBean容器
            jobBeanList = spider.getJobBeanList();
            
            //调用JobBean工具类保存JobBeanList到本地
            JobBeanUtils.saveJobBeanList(jobBeanList);
        
            //调用JobBean工具类从本地筛选并读取,得到JobBeanList
            jobBeanList = JobBeanUtils.loadJobBeanList();
            
        }
    }

    然后本地就有了JobInfo.txt

    然后就是把JobBean容器放到MySQL中了,我的数据库名字是51job,表名字是jobInfo,所有属性都是字符串,emmm就字符串吧

    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.SQLException;
    
    public class ConnectMySQL {
        //数据库信息
        private static final String DBaddress = "jdbc:mysql://localhost/51job?serverTimezone=UTC";
        private static final String userName = "root";
        private static final String password = "Woshishabi2813";
        
        private Connection conn;
        
        //加载驱动,连接数据库
        public ConnectMySQL() {
            LoadDriver();
            //连接数据库
            try {
                conn = DriverManager.getConnection(DBaddress, userName, password);
            } catch (SQLException e) {
                System.out.println("数据库连接失败");
            }
        }
        
        //加载驱动
        private void LoadDriver() {
            try {
                Class.forName("com.mysql.cj.jdbc.Driver");
                System.out.println("加载驱动成功");
            } catch (Exception e) {
                System.out.println("驱动加载失败");
            }
        }
        
        //获取连接
        public Connection getConn() {
            return conn;
        }
    }

    接着就是数据相关操作的工具类的编写了。

    import java.sql.Connection;
    import java.sql.PreparedStatement;
    import java.sql.ResultSet;
    import java.util.ArrayList;
    import java.util.List;
    
    
    public class DBUtils {
        
        /**将JobBean容器存入数据库(有筛选)
         * @param conn 数据库的连接
         * @param jobBeanList jobBean容器
         */
        public static void insert(Connection conn, List<JobBean> jobBeanList) {
            System.out.println("正在插入数据");
            PreparedStatement ps;
            for(JobBean j: jobBeanList) {
                //命令生成
                String command = String.format("insert into jobInfo values('%s','%s','%s','%s','%s','%s')",
                        j.getJobName(),
                        j.getCompany(),
                        j.getAddress(),
                        j.getSalary(),
                        j.getDate(),
                        j.getJobURL());
                
                try {
                    ps = conn.prepareStatement(command);
                    ps.executeUpdate();
                } catch (Exception e) {
                    System.out.println("存入数据库筛选有误信息:"+j.getJobName());
                }
            }
            System.out.println("插入数据完成");
    
        }
        
        /**将JobBean容器,取出
         * @param conn 数据库的连接
         * @return jobBean容器
         */
        public static List<JobBean> select(Connection conn){
            PreparedStatement ps;
            ResultSet rs;
            List<JobBean> jobBeanList  = new ArrayList<JobBean>();
    
            String command = "select * from jobInfo";
            try {
                ps = conn.prepareStatement(command);
                rs = ps.executeQuery();
                int col = rs.getMetaData().getColumnCount();
                while(rs.next()) {
                    JobBean jobBean = new JobBean(rs.getString(1), 
                                rs.getString(2), 
                                rs.getString(3), 
                                rs.getString(4),
                                rs.getString(5),
                                rs.getString(6));
    
                    jobBeanList.add(jobBean);
                }
                return jobBeanList;
            } catch (Exception e) {
                System.out.println("数据库查询失败");
            }
            return null;
        }
    }

    然后测试一下

    import java.sql.Connection;
    import java.util.ArrayList;
    import java.util.List;
    
    public class Test2 {
        public static void main(String[] args) {
            List<JobBean> jobBeanList = new ArrayList<>();
            jobBeanList = JobBeanUtils.loadJobBeanList();
    
            //数据库测试
            ConnectMySQL cm = new ConnectMySQL();
            Connection conn = cm.getConn();
            
            //插入测试
            DBUtils.insert(conn, jobBeanList);
            //select测试
            jobBeanList = DBUtils.select(conn);
            for(JobBean j: jobBeanList) {
                System.out.println(j);
            }
        }
    }

    上面的图可以看到虽然是“大数据+上海”,但是依旧有运维工程师上面不相关的,后面会进行过滤处理。这里就先存入数据库中

    先来个功能的整体测试,删除JobInfo.txt,重建数据库

    import java.sql.Connection;
    import java.util.ArrayList;
    import java.util.List;
    
    
    public class TestMain {
        public static void main(String[] args) {
            List<JobBean> jobBeanList = new ArrayList<>();
            //大数据+上海
            String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";        
    //        //Java+上海
    //        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,java,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";
            
            //所有功能测试
            //爬取的对象
            Spider jobSpider = new Spider(strURL);
            jobSpider.spider();
            //爬取完的JobBeanList
            jobBeanList = jobSpider.getJobBeanList();
            
            //调用JobBean工具类保存JobBeanList到本地
            JobBeanUtils.saveJobBeanList(jobBeanList);
        
            //调用JobBean工具类从本地筛选并读取,得到JobBeanList
            jobBeanList = JobBeanUtils.loadJobBeanList();
        
            //连接数据库,并获取连接
            ConnectMySQL cm = new ConnectMySQL();
            Connection conn = cm.getConn();
            
            //调用数据库工具类将JobBean容器存入数据库
            DBUtils.insert(conn, jobBeanList);
            
    //        //调用数据库工具类查询数据库信息,并返回一个JobBeanList
    //        jobBeanList = DBUtils.select(conn);
    //        
    //        for(JobBean j: jobBeanList) {
    //            System.out.println(j);
    //        }
        }
    }

    这些功能都是能独立使用的,不是一定要这样一路写下来。

    接下来就是进行数据库的读取,进行简单的过滤,然后进行分析了

    先上思维导图

    首先是过滤关键字和日期

    import java.util.ArrayList;
    import java.util.Calendar;
    import java.util.List;public class BaseFilter {
        private List<JobBean> jobBeanList;
        //foreach遍历不可以remove,Iterator有锁
        //用新的保存要删除的,然后removeAll
        private List<JobBean> removeList;
        
        public BaseFilter(List<JobBean> jobBeanList) {
            this.jobBeanList = new ArrayList<JobBean>();
            removeList =  new ArrayList<JobBean>();
            //引用同一个对象,getJobBeanList有没有都一样
            this.jobBeanList = jobBeanList;
            printNum();
        }
        
        //打印JobBean容器中的数量
        public void printNum() {
            System.out.println("现在一共"+jobBeanList.size()+"条数据");
        }
        
    
        /**筛选职位名字
         * @param containJobName 关键字保留
         */
        public void filterJobName(String containJobName) {
            for(JobBean j: jobBeanList) {
                if(!j.getJobName().contains(containJobName)) {
                    removeList.add(j);
                }
            }
            jobBeanList.removeAll(removeList);
            removeList.clear();
            printNum();
        }
        
        /**筛选日期,要当天发布的
         * @param
         */
        public void filterDate() {
            Calendar now=Calendar.getInstance();
            int nowMonth = now.get(Calendar.MONTH)+1;
            int nowDay = now.get(Calendar.DATE);
            
            for(JobBean j: jobBeanList) {
                String[] date = j.getDate().split("-");
                int jobMonth = Integer.valueOf(date[0]);
                int jobDay = Integer.valueOf(date[1]);
                if(!(jobMonth==nowMonth && jobDay==nowDay)) {
                    removeList.add(j);
                }
            }
            jobBeanList.removeAll(removeList);
            removeList.clear();
            printNum();
        }
        
        public List<JobBean> getJobBeanList(){
            return jobBeanList;
        }
        
    }

    测试一下过滤的效果

    import java.sql.Connection;
    import java.util.ArrayList;
    import java.util.List;
    
    
    public class Test3 {
        public static void main(String[] args) {
            List<JobBean> jobBeanList = new ArrayList<>();
            //数据库读取jobBean容器
            ConnectMySQL cm = new ConnectMySQL();
            Connection conn = cm.getConn();
            jobBeanList = DBUtils.select(conn);
            
            BaseFilter bf = new BaseFilter(jobBeanList);
            //过滤时间
            bf.filterDate();
            //过滤关键字
            bf.filterJobName("数据");
            bf.filterJobName("分析");
            
            for(JobBean j: jobBeanList) {
                System.out.println(j);
            }
        }
    }

    到这里基本是统一的功能,后面的分析就要按照不同职业,或者不同需求而定了,不过基本差不多,

    这里分析的就是“大数据+上海”下的相关信息了,为了数据量大一点,关键字带有"数据"就行,有247条信息

    用到了tablesaw的包,这个我看有人推荐,结果中间遇到问题都基本百度不到,只有官方文档,反复看了,而且这个还不能单独画出图,

    还要别的依赖包,所以我就做个表格吧。。。可视化什么的已经不想研究了(我为什么不用python啊。。。)

    分析也就没有什么面向对象需要写的了,基本就是一个main里面一路写下去了。具体用法可以看官方文档,就当看个结果了解一下

    工资统一为万/月

    import static tech.tablesaw.aggregate.AggregateFunctions.*;
    
    import java.sql.Connection;
    import java.util.ArrayList;
    import java.util.List;
    
    import tech.tablesaw.api.*;
    
    public class Analayze {
        public static void main(String[] args) {
            List<JobBean> jobBeanList = new ArrayList<>();
    
            ConnectMySQL cm = new ConnectMySQL();
            Connection conn = cm.getConn();
            jobBeanList = DBUtils.select(conn);
            
            BaseFilter bf = new BaseFilter(jobBeanList);
            bf.filterDate();
            bf.filterJobName("数据");
            int nums = jobBeanList.size();
            
            //分析
            //按照工资排序
            String[] jobNames = new String[nums];
            String[] companys = new String[nums];
            String[] addresss = new String[nums];
            double[] salarys = new double[nums];
            String[] jobURLs = new String[nums];
            for(int i=0; i<nums; i++) {
                JobBean j = jobBeanList.get(i);
                String jobName = j.getJobName();
                String company = j.getCompany();
                //地址提出区名字
                String address;
                if(j.getAddress().contains("-")) {
                    address = j.getAddress().split("-")[1];
                }else{
                    address = j.getAddress();
                }
                
                //工资统一单位
                String sSalary = j.getSalary();
                double dSalary;
                if(sSalary.contains("万/月")) {
                    dSalary = Double.valueOf(sSalary.split("-")[0]);
                }else if(sSalary.contains("千/月")) {
                    dSalary = Double.valueOf(sSalary.split("-")[0])/10;
                    dSalary = (double) Math.round(dSalary * 100) / 100;
                }else if(sSalary.contains("万/年")) {
                    dSalary = Double.valueOf(sSalary.split("-")[0])/12;
                    dSalary = (double) Math.round(dSalary * 100) / 100;
                }else {
                    dSalary = 0;
                    System.out.println("工资转换失败");
                    continue;
                }
                String jobURL = j.getJobURL();
                
                jobNames[i] = jobName;
                companys[i] = company;
                addresss[i] = address;
                salarys[i] = dSalary;
                jobURLs[i] = jobURL;
            }
            
            Table jobInfo = Table.create("Job Info")
                    .addColumns(
                        StringColumn.create("jobName", jobNames),
                        StringColumn.create("company", companys),
                        StringColumn.create("address", addresss),
                        DoubleColumn.create("salary", salarys),
                        StringColumn.create("jobURL", jobURLs)
                            );
            
    //        System.out.println("全上海信息");
    //        System.out.println(salaryInfo(jobInfo));
            
            
            List<Table> addressJobInfo = new ArrayList<>();
            //按照地区划分
            Table ShanghaiJobInfo = chooseByAddress(jobInfo, "上海");
            Table jingAnJobInfo = chooseByAddress(jobInfo, "静安区");
            Table puDongJobInfo = chooseByAddress(jobInfo, "浦东新区");
            Table changNingJobInfo = chooseByAddress(jobInfo, "长宁区");
            Table minHangJobInfo = chooseByAddress(jobInfo, "闵行区");
            Table xuHuiJobInfo = chooseByAddress(jobInfo, "徐汇区");
            //人数太少
    //        Table songJiangJobInfo = chooseByAddress(jobInfo, "松江区");
    //        Table yangPuJobInfo = chooseByAddress(jobInfo, "杨浦区");
    //        Table hongKouJobInfo = chooseByAddress(jobInfo, "虹口区");
    //        Table OtherInfo = chooseByAddress(jobInfo, "异地招聘");
    //        Table puTuoJobInfo = chooseByAddress(jobInfo, "普陀区");
            
            addressJobInfo.add(jobInfo);
            //上海地区招聘
            addressJobInfo.add(ShanghaiJobInfo);
            addressJobInfo.add(jingAnJobInfo);
            addressJobInfo.add(puDongJobInfo);
            addressJobInfo.add(changNingJobInfo);
            addressJobInfo.add(minHangJobInfo);
            addressJobInfo.add(xuHuiJobInfo);
    //        addressJobInfo.add(songJiangJobInfo);
    //        addressJobInfo.add(yangPuJobInfo);
    //        addressJobInfo.add(hongKouJobInfo);
    //        addressJobInfo.add(puTuoJobInfo);
    //        addressJobInfo.add(OtherInfo);
    
            for(Table t: addressJobInfo) {
                System.out.println(salaryInfo(t));
            }
            
            for(Table t: addressJobInfo) {
                System.out.println(sortBySalary(t).first(10));
            }
            
        }
        
        //工资平均值,最小,最大
        public static Table salaryInfo(Table t) {        
            return t.summarize("salary",mean,stdDev,median,max,min).apply();
        }
        
        //salary进行降序
        public static Table sortBySalary(Table t) {
            return t.sortDescendingOn("salary");
        }
        
        //选择地区
        public static Table chooseByAddress(Table t, String address) {
            Table t2 = Table.create(address)
                    .addColumns(
                        StringColumn.create("jobName"),
                        StringColumn.create("company"),
                        StringColumn.create("address"),
                        DoubleColumn.create("salary"),
                        StringColumn.create("jobURL"));
            for(Row r: t) {
                if(r.getString(2).equals(address)) {
                    t2.addRow(r);
                }
            }
            return t2;
        }
    }

    前半段是各个地区的信息

    后半段是各个区工资最高的前10名的信息,可以看到这个tablesaw的表要多难看有多难看。。。

    jobURL可以直接在浏览器里面看,

    换个URL进行测试

    我要找Java开发工作

    将之前TestMain中的strURL换成Java+上海

    https://search.51job.com/list/020000,000000,0000,00,9,99,java,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=

    删除JobInfo.txt,重建数据库

    运行,爬了270多页,本地JobInfo.txt

    数据库

     然后到Analyze中把bf.filterJobName("数据");

    改为“Java”,再加一个“开发”,然后运行

    信息全部都出来了,分析什么的,先照着表格说一点把。。。

    后面想要拓展的内容就是继续爬取jobURL然后把职位要求做统计。这还没做,暑假有兴趣应该会搞一下,

    然后可以把数据库设计一下,把工资分为最低和最高两项,存进去就变成double类型,这样以后分析也会轻松一点

    作者:PowerZZJ
    本博客文章大多为原创,转载请请在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
  • 相关阅读:
    使用jedis操作redis常用方法
    PostgreSQL下安装pg_stat_statements
    redis+keepalived安装
    eclipse离线安装Activiti Designer插件
    PostgreSQL下安装pg_stat_statements
    mysql设置索引
    使用appium在android7.0真机上测试程序时报错command failed shell “ps ‘uiautomator’”的解决方式
    appium--adb版本过低
    appium报错--版本不匹配
    win7提示不是正版桌面变黑
  • 原文地址:https://www.cnblogs.com/powerzzjcode/p/10922261.html
Copyright © 2011-2022 走看看