zoukankan      html  css  js  c++  java
  • C#读取word,ppt,excel,txt,pdf文件内容

    一、读取文件内容

    (1)word

     /// <summary>

            /// 读取doc、docx

            /// </summary>

            /// <param name="filepath">文件路径</param>

            /// <returns>字符串</returns>

            protected string contentdoc(string filepath)

            {

                Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();

                Document doc = null;

                object unknow = Type.Missing;

                app.Visible = true;

                string str = filepath;

                object file = str;

                doc = app.Documents.Open(ref file,

                    ref unknow, ref unknow, ref unknow, ref unknow,

                    ref unknow, ref unknow, ref unknow, ref unknow,

                    ref unknow, ref unknow, ref unknow, ref unknow,

                    ref unknow, ref unknow, ref unknow);

                //string temp = doc.Paragraphs[1].Range.Text.Trim();//分段读取

                string temp = doc.Content.Text;

                return temp;

            }

    说明:1: 对项目添加引用,Microsoft Word 11.0 Object Library
    2: 在程序中添加 using Word = Microsoft.Office.Interop.Word;
    3: 程序中添加
    Word.Application app = new Microsoft.Office.Interop.Word.Application(); //可以打开word程序
    Word.Document doc = null; //一会要记录word打开的文档

    参考网址:http://www.cnblogs.com/no7dw/archive/2009/08/14/1546367.html

    (2)Ppt

        /// <summary>

            /// 读取ppt内容

            /// </summary>

            /// <param name="filepath"></param>

            /// <returns></returns>

            protected string contentppt(string filepath)

            {

                Microsoft.Office.Interop.PowerPoint.Application pa = new Microsoft.Office.Interop.PowerPoint.Application();

                Microsoft.Office.Interop.PowerPoint.Presentation pp = pa.Presentations.Open(filepath,

                                Microsoft.Office.Core.MsoTriState.msoTrue,

                                Microsoft.Office.Core.MsoTriState.msoFalse,

                                Microsoft.Office.Core.MsoTriState.msoFalse);

                string pps = "";

                foreach (Microsoft.Office.Interop.PowerPoint.Slide slide in pp.Slides)

                {

                    foreach (Microsoft.Office.Interop.PowerPoint.Shape shape in slide.Shapes)

                     pps += shape.TextFrame.TextRange.Text.ToString();

                }

                return pps;

            }

    说明:: 对项目添加引用,Microsoft Word 11.0 Object Library
    2: 在程序中添加 using Word = Microsoft.Office.Interop.Powerpoint;
    3: 程序中添加
    Word.Application app = new Microsoft.Office.Interop.Powerpoint.Application(); //可以打开ppt程序

    参考网址:http://blog.sina.com.cn/s/blog_651ff6920100oi9u.html

    (3)Pdf

    /// <summary>

            /// 读取含有文本的pdf

            /// </summary>

            /// <param name="filepath">文件路径</param>

            /// <returns>字符串</returns>

            protected string contentpdf(string filepath)

            {

                StringBuilder text = new StringBuilder();

                string fileName = filepath;

                if (System.IO.File.Exists(fileName))

                {

                    PdfReader pdfReader = new PdfReader(fileName);

                    for (int page = 1; page <= pdfReader.NumberOfPages; page++)

                    {

                        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                        string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                        currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));

                        text.Append(currentText);

                    }

                    pdfReader.Close();

                }

                return text.ToString();

            }

    说明:引用iTextSharp.dll.等三个dll.

    using iTextSharp.text.pdf;

    using iTextSharp.text.pdf.parser;

    参考网址:http://www.codeproject.com/Tips/387327/Convert-PDF-file-content-into-string-using-Csharp

    (4)t'x't

    /// <summary>

            /// 读取txt

            /// </summary>

            /// <param name="filepath">文件路径</param>

            /// <returns>字符串</returns>

            protected string contenttxt(string filepath)

            {

                StringBuilder sb = new StringBuilder();

                //Open the stream and read it back.

                using (FileStream fs = new FileStream(filepath, FileMode.Open))

                {

                    byte[] b = new byte[fs.Length];

                    fs.Read(b, 0, b.Length);//把文件读进byte[]里面

                    sb.Append(Encoding.GetEncoding("gb2312").GetString(b));//从byte[]里面把数据转成字符放到sb里面

                }

                return sb.ToString();

            }

    (5)excel

    详细参考:http://www.cnblogs.com/Tsong/archive/2013/02/21/2920941.html

  • 相关阅读:
    使用grep搜索多个字符串
    Linux中如何启用root用户
    Docker Image 的发布和 Container 端口映射
    IIS负载均衡
    IIS负载均衡ARR前端请求到本地服务器和后端处理服务器
    IIS http重定向https,强制用户使用https访问的配置方法-iis设置
    IIS中应用Application Request Route 配置负载均衡
    IIS配置HTTPSIIS配置HTTPS
    asp.net用户登入验证
    高频交易建模
  • 原文地址:https://www.cnblogs.com/mydotnetforyou/p/3551905.html
Copyright © 2011-2022 走看看