zoukankan      html  css  js  c++  java
  • Java网络爬虫

    WikiScraper.java

    package master.haku.scrape;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import java.net.*;
    import java.io.*;
    
    public class WikiScraper {
        public static void main(String[] args) {
            scrapeTopic("/wiki/Python");
        }
    
        public static void scrapeTopic(String url) {
            String html = getUrl("https://en.wikipedia.org" + url);
            Document doc = Jsoup.parse(html);
            String contentText = doc.select("#mw-content-text > p").first().text();
            System.out.println(contentText);
        }
    
        public static String getUrl(String url) {
            URL urlObj = null;
            try {
                urlObj = new URL(url);
            } catch (MalformedURLException e) {
                System.out.println("The url was malformed!");
                return "";
            }
    
            URLConnection urlCon = null;
            BufferedReader in = null;
            String outputText = "";
    
            try {
                urlCon = urlObj.openConnection();
                in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
                String line = "";
                while ((line = in.readLine()) != null) {
                    outputText += line;
                }
                in.close();
            } catch (IOException e) {
                System.out.println("There was an error connecting to the URL");
                return "";
            }
    
            return outputText;
        }
    }

    运行结果:

    A python is a constricting snake belonging to the Python (genus), or, more generally, any snake in the family Pythonidae (containing the Python genus).

  • 相关阅读:
    ASP.Net无法连接Oracle的一个案例
    给Oracle添加split和splitstr函数
    笨猪大改造
    设计模式(一)策略模式
    jQuery select 操作全集
    现在的心情
    jquery 自动实现autocomplete+ajax
    c# 配置连接 mysql
    jquery.ajax和Ajax 获取数据
    C# 加密可逆
  • 原文地址:https://www.cnblogs.com/davidgu/p/4836305.html
Copyright © 2011-2022 走看看