zoukankan      html  css  js  c++  java
  • java Jsoup 抓取页面数据

    List<ImageBean> imgList = new ArrayList<ImageBean>();
            ImageBean image = null;
            String imageTime = "";
            String imageName = "";
            String url = "";
            for (Map.Entry<String, String> entry : map.entrySet()) {
                try {
                    Document doc = Jsoup.connect(entry.getKey()).get();
                    Elements scripts = doc.select("script");
    
                    JSONObject obj = null;
                    String[] datas = entry.getValue().split(this.split);
                    for (int i = 0; i < scripts.size(); i++) 
                    {
                        Element script = scripts.get(i); // Get the script part
                        Pattern p = Pattern.compile(datas[3]); // 匹配图片链接地址的正则表达式
                        Matcher m = p.matcher(script.html()); // 匹配的字符串
                        while (m.find()) 
                        {
                            image = new ImageBean();
                            String matchStr = m.group(1);
                            obj = JSONObject.parseObject(matchStr);
                            url = datas[1] + obj.getString(datas[4]);
                            image.setUrl(url);
                            imageTime = getImageTime(url);
                            image.setName(imageTime);
                            image.setType(datas[3]);
                            image.setImageType(datas[5]);
                            imgList.add(image);
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
    <entry key="http://www.nmc.cn/publish/nwp/t639/ea/500hPa-hgt.html">
                        <value>高度场~http://image.nmc.cn~type~data.push(({*.*?}))~img_path~nmc_fore_t639_hgt</value>
                    </entry>
  • 相关阅读:
    List sort()方法
    解析器
    beautifulsoup库
    break 语句
    enumerate函数
    POJ 1915 Knight Moves
    POJ 1745 Divisibility
    POJ 1731 Orders
    POJ 1664 放苹果
    POJ 1606 Jugs
  • 原文地址:https://www.cnblogs.com/tiandi/p/6145957.html
Copyright © 2011-2022 走看看