zoukankan      html  css  js  c++  java
  • C# 将html文本转化为 文本内容方法TextNoHTML

    不记得在哪看过这个,挺实用的

    /// <summary>
    /// 将html文本转化为 文本内容方法TextNoHTML
    /// </summary>
    /// <param name="Htmlstring">HTML文本值</param>
    /// <returns></returns>
    public string TextNoHTML(string Htmlstring)
    {
        //删除脚本   
        Htmlstring = Regex.Replace(Htmlstring, @"<script[^>]*?>.*?</script>", "", RegexOptions.IgnoreCase);
        //删除HTML   
        Htmlstring = Regex.Replace(Htmlstring, @"<(.[^>]*)>", "", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"([/r/n])[/s]+", "", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"<!--.*", "", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(quot|#34);", "/", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(amp|#38);", "&", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(lt|#60);", "<", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(gt|#62);", ">", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(nbsp|#160);", "   ", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(iexcl|#161);", "/xa1", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(cent|#162);", "/xa2", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(pound|#163);", "/xa3", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&(copy|#169);", "/xa9", RegexOptions.IgnoreCase);
        Htmlstring = Regex.Replace(Htmlstring, @"&#(/d+);", "", RegexOptions.IgnoreCase);
        //替换掉 < 和 > 标记
        Htmlstring = Htmlstring.Replace("<", "");
        Htmlstring = Htmlstring.Replace(">", "");
        Htmlstring = Htmlstring.Replace("
    ", "");
        Htmlstring = Htmlstring.Replace("
    ", "");
        Htmlstring = Htmlstring.Replace("
    ", "");
        //返回去掉html标记的字符串
        return Htmlstring;
    }
    /// <summary>  
    /// 获取Img的路径  
    /// </summary>  
    /// <param name="htmlText">Html字符串文本</param>  
    /// <returns>以数组形式返回图片路径</returns>  
    public static string[] GetHtmlImageUrlList(string htmlText)
    {
        Regex regImg = new Regex(@"<img[^<>]*?src[s	
    ]*=[s	
    ]*[""']?[s	
    ]*(?<imgUrl>[^s	
    ""'<>]*)[^<>]*?/?[s	
    ]*>", RegexOptions.IgnoreCase);
        //新建一个matches的MatchCollection对象 保存 匹配对象个数(img标签)  
        MatchCollection matches = regImg.Matches(htmlText);
        int i = 0;
        string[] sUrlList = new string[matches.Count];
        //遍历所有的img标签对象  
        foreach (Match match in matches)
        {
            //获取所有Img的路径src,并保存到数组中  
            sUrlList[i++] = match.Groups["imgUrl"].Value;
        }
        return sUrlList;
    }
  • 相关阅读:
    TextBox 只有下划线
    can't find web control library(web控件库)
    DropDownListSalesAC”有一个无效 SelectedValue,因为它不在项目列表中。
    IDE、SATA、SCSI、SAS、FC、SSD 硬盘类型
    如何打印1px表格
    CSS控制打印 分页
    Virtual Server could not open its emulated Ethernet switch driver. To fix this problem, reenable the Virtual Server Emulated Et
    Xml中SelectSingleNode方法中的xpath用法
    热带水果莫入冰箱?水果存放冰箱大法
    探索Asp.net的Postback机制
  • 原文地址:https://www.cnblogs.com/zhyue93/p/Csharp_html_img.html
Copyright © 2011-2022 走看看