zoukankan      html  css  js  c++  java
  • 正则表达式的学习网址收藏

    1.正则表达式30分钟入门教程

    www.jb51.net/tools/zhengze.html

    2.正则表达式手册

    http://tool.oschina.net/uploads/apidocs/jquery/regexp.html
    常用正则表达式也有

    3.不捕获

    https://www.cnblogs.com/pmars/archive/2011/12/30/2307507.html

    4.捕获组

    https://www.jianshu.com/p/5150863e7f7a

    5.正则表达式分组()、不捕获(?:)和断言(?<=)详解

    https://www.cnblogs.com/leezhxing/p/4333773.html

    6.正则表达式中?:起“不捕获”的作用,那不捕获分组和捕获分组有什么区别?

    最近正在学习正则,也发现了捕获性分组和非捕获性分组,也在寻找答案。终于通过努力,研究懂了他们的区别到底在哪。我就不说术语了,直接用例子会表述的更清楚:
    要在一篇文章中查找”program”和”project”两个单词,正则表达式可表示为/program|project/,也可表示为/pro(gram|ject)/,但是缓存子匹配(gramject)没有意义,就可以用/pro(?:gram|ject)/进行非捕获性匹配这样既可以简洁匹配又可不缓存无实际意义的字匹配。
    作者:冰冻三寸
    链接:https://www.zhihu.com/question/19853431/answer/160306020
    来源:知乎
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

    一个我在用的时候看的例子。

    /**
         * Get usernames mentioned in a list of tweets.
         * 
         * @param tweets
         *            list of tweets with distinct ids, not modified by this method.
         * @return the set of usernames who are mentioned in the text of the tweets.
         *         A username-mention is "@" followed by a Twitter username (as
         *         defined by Tweet.getAuthor()'s spec).
         *         The username-mention cannot be immediately preceded or followed by any
         *         character valid in a Twitter username.
         *         For this reason, an email address like bitdiddle@mit.edu does NOT 
         *         contain a mention of the username mit.
         *         Twitter usernames are case-insensitive, and the returned set may
         *         include a username at most once.
         */
        /*
         * author.length > 0
         *    all characters in author are drawn from {A..Z, a..z, 0..9, _, -}
         *    text.length <= 140
         * */
        public static Set<String> getMentionedUsers(List<Tweet> tweets) {
            Set<String> s=new HashSet<>();
            //List<String> names=new ArrayList<>();
    
            for(int i=0;i<tweets.size();i++)
            {
                String pattern = "(?:^|[^\w-])@([\w-]+)(?:$|[^\w-])";
                //?:不捕获。要么是空白符,要么是非字母或者非-   要么是以@开头,前面没有字符,后面是为了抓取用户名,最后面是看抓取到什么位置结束。$字符串的结束。
                Pattern p = Pattern.compile(pattern);
                String tmp = tweets.get(i).getText();
                tmp = tmp.replaceAll("[^\w-@]", "  ");
                Matcher m = p.matcher(tmp);
                while(m.find())
                {
                    System.out.println(m.group(1));
                    s.add(m.group(1).toLowerCase());
                }
            }
                /*模式匹配,但是对重复的名字,比如sunny和sun就无法区分了
                String find=tweets.get(i).getAuthor().toLowerCase();
                for(int j=0;j<tweets.size();j++)
                {
                    if(tweets.get(j).getText().toLowerCase().indexOf("@"+find)!=-1)
                    {
                        s.add("@"+find);
                        System.out.println("@"+find);
                        break;
                    }
                }*/
    
            return s;
        }
  • 相关阅读:
    windows7环境下使用pip安装MySQLdb
    ZeroMQ
    LazyValue<T>
    方法执行失败,重复执行指定次数某个方法
    关于截取字符串substr和substring两者的区别
    C#的字符串优化-String.Intern、IsInterned
    几张图轻松理解String.intern()
    string 线程安全
    请问C#中string是值传递还是引用传递?
    C# String与StringBuilder
  • 原文地址:https://www.cnblogs.com/hitWTJ/p/9865432.html
Copyright © 2011-2022 走看看