zoukankan      html  css  js  c++  java
  • C#去掉HTML标记

    (1)方法一

      public string RemoveHTMLTags(string htmlStream)
            {
                if (htmlStream == null)
                {
                    throw new Exception("Your input html stream is null!");
                    return null;
                }

                /*
                 * 最好把所有的特殊HTML标记都找出来,然后把与其相对应的Unicode字符一起影射到Hash表内,最后一起都替换掉
                 */

                //先单独测试,成功后,再把所有模式合并

                //注:这两个必须单独处理
                //去掉嵌套了HTML标记的JavaScript:(<script)[\s\S]*(</script>)
                //去掉css标记:(<style)[\s\S]*(</style>)
                //去掉css标记:\..*\{[\s\S]*\}
                htmlStream = Regex.Replace(htmlStream, "(<script)[\s\S]*?(</script>)|(<style)[\s\S]*?(</style>)", " ", RegexOptions.IgnoreCase);
                //htmlStream = RemoveTag(htmlStream, "script");
                //htmlStream = RemoveTag(htmlStream, "style");

                //去掉普通HTML标记:<[^>]+>
                //替换空格:&nbsp;|&amp;|&shy;|&#160;|&#173;
                htmlStream = Regex.Replace(htmlStream, "<[^>]+>|&nbsp;|&amp;|&shy;|&#160;|&#173;|&bull;|&lt;|&gt;", " ", RegexOptions.IgnoreCase);
                //htmlStream = RemoveTag(htmlStream);

                //替换左尖括号
                //htmlStream = Regex.Replace(htmlStream, "&lt;", "<");

                //替换右尖括号
                //htmlStream = Regex.Replace(htmlStream, "&gt;", ">");

                //替换空行
                //htmlStream = Regex.Replace(htmlStream, "[ | | ]", " ");//[ | ][ *| *]*[ | ]
                htmlStream = Regex.Replace(htmlStream, "( [ | | | ]* )|( [ | | | ]* )", " ");
                htmlStream = Regex.Replace(htmlStream, "[ | ]{1,}", " ");

                return htmlStream.Trim();
            }

    (2)网上搜索到的方法

     // 除去所有在html元素中标记
        public static string striphtml(string strhtml)
        {
            string stroutput = strhtml;
            Regex regex = new Regex(@"<[^>]+>|</[^>]+>");

            stroutput = regex.Replace(stroutput, "");
            return stroutput;

        }

  • 相关阅读:
    洛谷P4382 [八省联考2018]劈配(网络流,二分答案)
    洛谷P3380 【模板】二逼平衡树(树套树,树状数组,线段树)
    C++实用整数快速输入输出模板(C++)
    洛谷P3348 [ZJOI2016]大森林(LCT,虚点,树上差分)
    洛谷P4338 [ZJOI2018]历史(LCT,树形DP,树链剖分)
    洛谷P3613 睡觉困难综合征(LCT,贪心)
    洛谷P3960 列队(NOIP2017)(Splay)
    洛谷P3275 [SCOI2011]糖果(差分约束,最长路,Tarjan,拓扑排序)
    博弈论总结(只会打表,永不证明)(博弈论)
    洛谷P1450 [HAOI2008]硬币购物(背包问题,容斥原理)
  • 原文地址:https://www.cnblogs.com/sky-net/p/4442297.html
Copyright © 2011-2022 走看看