zoukankan      html  css  js  c++  java
  • JAVA笔记27-正则表达式(RegularExpressions)

    正则表达式是字符串的处理利器。

    用途:字符串匹配(字符匹配)、字符串查找、字符串替换

    例如:IP地址是否正确、从网页中揪出email地址(如垃圾邮件)、从网页中揪出链接等

    涉及到的类:java.lang.String, java.util.regex.Pattern, java.util.regex.Matcher

    例1:Pattern是模式,Matcher是与模式匹配后的结果。

    典型的调用顺序是

     Pattern p = Pattern.compile("a*b");
     Matcher m = p.matcher("aaaaab");
     boolean b = m.matches();
    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            System.out.println("abc".matches("..."));  
            System.out.println("a3435f".replaceAll("\d","-"));
            Pattern p = Pattern.compile("[a-z]{3}");
            Matcher m = p.matcher("fgh");
            System.out.println(m.matches());
            System.out.println("fgha".matches("[a-z]{3}"));
        }
    }

     输出:

    true
    a----f
    true
    false

    例2:

    X? X,一次或一次也没有
    X* X,零次或多次
    X+ X,一次或多次
    X{n} X,恰好 n
    X{n,} X,至少 n
    X{n,m} X,至少 n 次,但是不超过 m
    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            //?={0,1},      *={0,},     +={1,}  
            System.out.println("a".matches("."));  
            System.out.println("aa".matches("aa"));  
            System.out.println("aaaa".matches("a*"));  
            System.out.println("aaaa".matches("a+"));  
            System.out.println("aaaa".matches("a?"));  //false
            System.out.println("".matches("a*"));  
            System.out.println("".matches("a?"));  
            System.out.println("a".matches("a?"));  
            System.out.println("2455668678".matches("\d{3,100}"));  
            System.out.println("192.168.0.aaa".matches("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"));//false
            System.out.println("192".matches("[0-2][0-9][0-9]"));  
        }
    }

    例3:[]代表其中任何一个字符,[^]代表除这些以外的一个字符

    [abc] abc(简单类)
    [^abc] 任何字符,除了 abc(否定)
    [a-zA-Z] azAZ,两头的字母包括在内(范围)
    [a-d[m-p]] admp[a-dm-p](并集)
    [a-z&&[def]] def(交集)
    [a-z&&[^bc]] az,除了 bc[ad-z](减去)
    [a-z&&[^m-p]] az,而非 mp[a-lq-z](减去)
    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            System.out.println("a".matches("[abc]"));  
            System.out.println("a".matches("[^abc]"));  //除abc false
            System.out.println("A".matches("[a-zA-Z]"));  
            System.out.println("A".matches("[a-z]|[A-Z]")); 
            System.out.println("A".matches("[a-z[A-Z]]")); 
            System.out.println("R".matches("[A-Z&&[RFG]]")); 
        }
    }

    例4:

    d 数字:[0-9]
    D 非数字: [^0-9]
    s 空白字符:[ x0Bf ]
    S 非空白字符:[^s]
    w 单词字符:[a-zA-Z_0-9]
    W 非单词字符:[^w]
    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            System.out.println(" 
    
    	".matches("\s{4}"));  
            System.out.println(" ".matches("\S"));  // false
            System.out.println("a_8".matches("\w{3}"));  
            System.out.println("abc888&^%".matches("[a-z]{1,3}\d+[&^#%]+")); 
            System.out.println("\".matches("\\")); 
        }
    }

    注意:正则表达式中,要匹配一个,必须要用\。而用字符串表示正则表达式时,正则表达式中的一个就需要字符串中的两个

     例5:POSIX字符类(不常用)

    p{Lower} 小写字母字符:[a-z]
    p{Upper} 大写字母字符:[A-Z]
    p{ASCII} 所有 ASCII:[x00-x7F]
    p{Alpha} 字母字符:[p{Lower}p{Upper}]
    p{Digit} 十进制数字:[0-9]
    p{Alnum} 字母数字字符:[p{Alpha}p{Digit}]
    p{Punct} 标点符号:!"#$%&'()*+,-./:;<=>?@[]^_`{|}~
    p{Graph} 可见字符:[p{Alnum}p{Punct}]
    p{Print} 可打印字符:[p{Graph}x20]
    p{Blank} 空格或制表符:[ ]
    p{Cntrl} 控制字符:[x00-x1Fx7F]
    p{XDigit} 十六进制数字:[0-9a-fA-F]
    p{Space} 空白字符:[ x0Bf ]
    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            System.out.println("a".matches("\p{Lower}"));  
        }
    }

    例6:边界匹配

    ^ 行的开头
    $ 行的结尾
     单词边界
    B 非单词边界
    A 输入的开头
    G 上一个匹配的结尾
     输入的结尾,仅用于最后的结束符(如果有的话)
    z 输入的结尾

    注:^在[]中是取反的意思,在[]外表示行的开头。

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            System.out.println("hello sir".matches("^h.*")); 
            System.out.println("hello sir".matches(".*ir$"));  
            System.out.println("hello sir".matches("^h[a-z]{1,3}o\b.*"));  
            System.out.println("hellosir".matches("^h[a-z]{1,3}o\b.*"));  //false
            System.out.println(" 
    ".matches("^[\s&&[^\n]]*\n$"));//空白行
        }
    }

    练习1:true or false?

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            System.out.println("aaa 8888c".matches(".*\d{4}.")); 
            System.out.println("aaa 8888c".matches(".*\b\d{4}."));  //true!
            System.out.println("aaa 8888c".matches(".{3}\b\d{4}."));  //false
            System.out.println("aaa8888c".matches(".*\d{4}."));  
            System.out.println("aaa8888c".matches(".*\b\d{4}."));  //false
        }
    }

    例7:matches find lookingAt

    matches是匹配整个字符串,find是找子串,两者会相互影响,它们都会吃掉已经判断过的字符串。 

    find不必须从头开始匹配,只要找到匹配的就可以

    lookingAt每次都从开头找

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            String s = "123-34545-234-00";
            Pattern p = Pattern.compile("\d{3,5}");
            Matcher m = p.matcher(s);
            System.out.println(m.matches());//false
            m.reset();
            System.out.println(m.find()); 
            System.out.println(m.find()); 
            System.out.println(m.find()); 
            System.out.println(m.find()); //false
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
        }
    }
    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            String s = "123-34545-234-00";
            Pattern p = Pattern.compile("\d{3,5}");
            Matcher m = p.matcher(s);
            System.out.println(m.matches());//false
            //m.reset();
            System.out.println(m.find()); 
            System.out.println(m.find()); 
            System.out.println(m.find()); //false
            System.out.println(m.find()); //false
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
        }
    }

     例8:[start end)    包含start,不包含end

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            String s = "123--34545--234-00";
            Pattern p = Pattern.compile("\d{3,5}");
            Matcher m = p.matcher(s);
            System.out.println(m.matches());//false
            m.reset();
            System.out.println(m.find()); 
            System.out.println(m.start()+"-"+m.end()); 
            System.out.println(m.find()); 
            System.out.println(m.start()+"-"+m.end()); 
            System.out.println(m.find()); 
            System.out.println(m.start()+"-"+m.end()); 
            System.out.println(m.find()); //false
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
            System.out.println(m.lookingAt()); 
        }
    } 

    输出:

    false
    true
    0-3
    true
    5-10
    true
    12-15
    false
    true
    true
    true
    true

    例9:替换

    (1)

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            Pattern p = Pattern.compile("java");
            Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
            while(m.find()){
                System.out.println(m.group()); 
            }
        }
    }

    输出:

    java
    java
    java
    java

    (2)

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
            Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
            while(m.find()){
                System.out.println(m.group()); 
            }
        }
    }

    输出:

    java
    Java
    JAva
    java
    JAVA
    java
    java

    (3)

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
            Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
            System.out.println(m.replaceAll("JAVA")); 
        }
    }

    输出:

    JAVA JAVA JAVA JAVA IloveJAVA YOUhateJAVAJAVA end

    (4)

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
            Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
            StringBuffer buf = new StringBuffer();
            int i = 0 ;
            while(m.find()){
                i++;
                if(i%2 == 0){
                    m.appendReplacement(buf,"java");
                }else{
                    m.appendReplacement(buf,"JAVA");
                }
            }
            m.appendTail(buf);
            System.out.println(buf); 
        }
    }

    输出:

    JAVA java JAVA java IloveJAVA YOUhatejavaJAVA end

    例10:分组:标号是左小括号数。

    import java.util.regex.*;
    public class Test{
        public static void main(String args[]){
            Pattern p = Pattern.compile("(\d{3,5})([a-z]{2})");
            String s = "123aa-34556bb-456cc-00";
            Matcher m = p.matcher(s);
            while(m.find()){
                System.out.println(m.group(1));
            }
            
        }
    }

    输出:

    123
    34556
    456

    如果是group(),则输出

    123aa
    34556bb
    456cc

    练习1:抓取网页中的email地址

    import java.util.regex.*;
    import java.io.*;
    public class Test{
        public static void main(String args[]){
            try{
                BufferedReader br = new     BufferedReader(new FileReader("abc.htm"));
                String s = null ;
                while((s = br.readLine())!= null){
                    parse(s);
                }
            }catch(FileNotFoundException e){
                e.printStackTrace();
            }catch(IOException e){
                e.printStackTrace();
            }
        }
        private static void parse(String s){                
            Pattern p = Pattern.compile("[\w[.-]]+@[\w[.-]]+\.[\w]+");
            Matcher m = p.matcher(s);
            while(m.find()){
                System.out.println(m.group());
            }
        }
    }

     存入文件:

    import java.util.regex.*;
    import java.io.*;
    public class Test{
        public static void main(String args[]){
            try{
                BufferedReader br = new     BufferedReader(new FileReader("abc.htm"));
                BufferedWriter bw = new BufferedWriter(new FileWriter("email.txt"));
                String s = null ;
                while((s = br.readLine())!= null){
                    parse(s,bw);
                }
            bw.close();
            }catch(FileNotFoundException e){
                e.printStackTrace();
            }catch(IOException e){
                e.printStackTrace();
            }
        }
        private static void parse (String s, BufferedWriter bw) throws IOException{                
            Pattern p = Pattern.compile("[\w[.-]]+@[\w[.-]]+\.[\w]+");
            Matcher m = p.matcher(s);
            while(m.find()){
                bw.write(m.group());
                bw.newLine();
            }
            bw.flush();
        }
    }

     练习2:统计代码行数

    import java.util.regex.*;
    import java.io.*;
    public class CodeCounter{
        static long normalLines = 0;
        static long commentLines = 0;
        static long whiteLines = 0;
        public static void main(String args[]){
            File f = new File("E:/javacode/20140426");
            File[] codeFiles = f.listFiles();
            for(File child : codeFiles){
                if(child.getName().matches(".*\.java$"))
                    parse(child);
            }
        System.out.println("normalLines: "+normalLines);
        System.out.println("commentLines: "+commentLines);
        System.out.println("whiteLines: "+whiteLines);
        }
    
        private static void parse(File f){
            BufferedReader br = null ;
            boolean comment = false;
            try{
                br = new BufferedReader(new FileReader(f));
                String line = "";
                while((line = br.readLine())!=null){
                    line = line.trim();
                    if(line.matches("^[\s&&[^\n]]*$")){
                        whiteLines++;
                    }else if(line.startsWith("/*")&&line.endsWith("*/")){
                        commentLines++;
                    }else if(line.startsWith("/*")&&!line.endsWith("*/")){
                        commentLines++;
                        comment=true;
                    }else if(true == comment){
                        commentLines++;
                        if(line.endsWith("*/")){
                            comment=false;
                        }
                    }else if(line.startsWith("//")){
                        commentLines++;
                    }else{
                        normalLines++;
                    }
                }
            }catch(FileNotFoundException e){
                e.printStackTrace();
            }catch(IOException e){
                e.printStackTrace();
            }finally{
                if(br!=null){
                    try{
                        br.close();
                        br=null;
                    }catch(IOException e){
                        e.printStackTrace();
                    }
                }
            }
        }
    }
  • 相关阅读:
    SQL存储过程基础(从基础开始学,加油!)
    SQL语句经典大全
    SQL SQL语句的增删改查
    web app iphone4 iphone5 iphone6 响应式布局 适配代码
    DOM和 jquery 基础
    HTML 和CSS 语言
    python的目标
    老男孩学习DAY11-1 进程、进程池、协程
    老男孩python DAY10 soket 编程
    老男孩全栈PYTHON -DAY8-面向妹子(对象)编程
  • 原文地址:https://www.cnblogs.com/seven7seven/p/3688576.html
Copyright © 2011-2022 走看看