01_正则表达式_06_简易爬虫获取数据

【简述】

本篇内容简单写了一个爬取网页的的邮箱内容。

网址就是一个博客园自己的测试网址：http://www.cnblogs.com/HigginCui/p/5809835.html

【代码】

package com.Higgin.Regex;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.junit.Test;


public class SpilderDemo {
    
    @Test
    public void getEmail() throws Exception{
        URL url=new URL("http://www.cnblogs.com/HigginCui/p/5809835.html");
        URLConnection conn=url.openConnection();
        BufferedReader bufIn=new BufferedReader(new InputStreamReader(conn.getInputStream()));
        String line=null;
        String mailreg="\w+@\w+(\.\w+)+";
        
        Pattern p=Pattern.compile(mailreg);
        
        while((line=bufIn.readLine())!=null){
            Matcher m=p.matcher(line);
            while(m.find()){
                System.out.println(m.group());
            }
        }
    }
    
}

【运行结果】

查看全文

相关阅读:
SAXParseException;前言中不允许有内容的错误
 FATAL Alert:BAD_CERTIFICATE
DB2的递归
 在Unity中针对屏幕自适应，我们该如何做呢？
原码与反码的区别？
在Unity 3D中加入Image图片
 你的外接键盘的小键盘在Num Lock键亮着的，但是数字按了不能用，解决办法在这里
 唯美英文（一）
如何使用gcc编译器
 C++中const的用法

原文地址：https://www.cnblogs.com/HigginCui/p/5811631.html