zoukankan      html  css  js  c++  java
  • Java里String.split需要注意的用法

    我们常常用String的split()方法去分割字符串,有两个地方值得注意:

    1. 当分隔符是句号时("."),需要转义:

    由于String.split是基于正则表达式来分割字符串,而句号在正则表达式里表示任意字符。

    //Wrong:
    //String[] words = tmp.split(".");
    
    //Correct:
    String[] words = tmp.split("\.");

    所以,假设分隔符在正则表达式里有一定的意义时,需要格外留心,必须将它们转义才能达到分割的效果。

    2. 假设字符串最后有连续多个分隔符,且这些分隔符都需要被分割的话,需要调用split(String regex,int limit)这个方法:

    String abc = "a,b,c,,,";
    String[] str = abc.split(",");
            
    System.out.println(Arrays.toString(str)+" "+str.length);
            
    String[] str2 = abc.split(",",-1);
            
    System.out.println(Arrays.toString(str2)+" "+str2.length);

    输出如下:

    [a, b, c] 3
    [a, b, c, , , ] 6

    需要输出csv文件的时候,尤其需要注意。

    3. 假设需要快速分割字符串,split()并不是最有效的方法。在split()方法内,有如下的实现:

    1 public String[] split(String regex, int limit) {
    2       return Pattern.compile(regex).split(this, limit);
    3 }

    频繁调用split()会不断创建Pattern这个对象,因此可以这样去实现,减少Pattern的创建:

    1 //create the Pattern object outside the loop    
    2 Pattern pattern = Pattern.compile(" ");
    3 
    4 for (int i = 0; i < 1000000; i++)
    5 {
    6     String[] split = pattern.split("Hello World", 0);
    7     list.add(split);
    8 }

    另外split()也往往比indexOf()+subString()这个组合分割字符串要稍慢,详情可看这个帖子

    我在本机做过测试,感觉indexOf()+subString()比split()快一倍:

     1 public static void main(String[] args) {
     2         StringBuilder sb = new StringBuilder();
     3         for (int i = 100000; i < 100000 + 60; i++)
     4             sb.append(i).append(' ');
     5         String sample = sb.toString();
     6 
     7         int runs = 100000;
     8         for (int i = 0; i < 5; i++) {
     9             {
    10                 long start = System.nanoTime();
    11                 for (int r = 0; r < runs; r++) {
    12                     StringTokenizer st = new StringTokenizer(sample);
    13                     List<String> list = new ArrayList<String>();
    14                     while (st.hasMoreTokens())
    15                         list.add(st.nextToken());
    16                 }
    17                 long time = System.nanoTime() - start;
    18                 System.out.printf("StringTokenizer took an average of %.1f us%n", time / runs
    19                         / 1000.0);
    20             }
    21             {
    22                 long start = System.nanoTime();
    23                 Pattern spacePattern = Pattern.compile(" ");
    24                 for (int r = 0; r < runs; r++) {
    25                     List<String> list = Arrays.asList(spacePattern.split(sample, 0));
    26                 }
    27                 long time = System.nanoTime() - start;
    28                 System.out.printf("Pattern.split took an average of %.1f us%n", time / runs
    29                         / 1000.0);
    30             }
    31             {
    32                 long start = System.nanoTime();
    33                 for (int r = 0; r < runs; r++) {
    34                     List<String> list = new ArrayList<String>();
    35                     int pos = 0, end;
    36                     while ((end = sample.indexOf(' ', pos)) >= 0) {
    37                         list.add(sample.substring(pos, end));
    38                         pos = end + 1;
    39                     }
    40                 }
    41                 long time = System.nanoTime() - start;
    42                 System.out
    43                         .printf("indexOf loop took an average of %.1f us%n", time / runs / 1000.0);
    44             }
    45         }
    46     }

    在jdk1.7测试后,结果如下:

    StringTokenizer took an average of 7.2 us
    Pattern.split took an average of 7.9 us
    indexOf loop took an average of 3.5 us

    ------------------------------------------
    StringTokenizer took an average of 6.8 us
    Pattern.split took an average of 5.4 us
    indexOf loop took an average of 3.1 us

    ------------------------------------------
    StringTokenizer took an average of 6.0 us
    Pattern.split took an average of 5.5 us
    indexOf loop took an average of 3.1 us

    ------------------------------------------
    StringTokenizer took an average of 5.9 us
    Pattern.split took an average of 5.5 us
    indexOf loop took an average of 3.1 us

    ------------------------------------------
    StringTokenizer took an average of 6.4 us
    Pattern.split took an average of 5.5 us
    indexOf loop took an average of 3.2 us

    本文完

  • 相关阅读:
    Android PopupWindow 弹窗背景半透明,设置最大高度
    Android性能优化之:ViewStub
    EventBus使用详解(一)——初步使用EventBus
    Android开发中,那些让你相见恨晚的方法、类或接口
    android 提高进程优先级 拍照永不崩溃(闪退)
    Android框架 加载图片 库 Picasso 的使用简介
    vc 取windows系统信息 版本 cpu信息 内存信息 ie版本信息 office版本
    VC 三点 划 曲线
    VC 类泡泡龙游戏算法
    vc 判断哪个按键 被按下 消息 按键 状态
  • 原文地址:https://www.cnblogs.com/techyc/p/3709182.html
Copyright © 2011-2022 走看看