zoukankan      html  css  js  c++  java
  • 提取Word里的文本内容 C#

         using DocumentFormat.OpenXml.Packaging;  
         public
    static string TextFromWord(string path) { const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"; StringBuilder textBuilder = new StringBuilder(); using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(path, false)) { // Manage namespaces to perform XPath queries. NameTable nt = new NameTable(); XmlNamespaceManager nsManager = new XmlNamespaceManager(nt); nsManager.AddNamespace("w", wordmlNamespace); // Get the document part from the package. // Load the XML in the document part into an XmlDocument instance. XmlDocument xdoc = new XmlDocument(nt); xdoc.Load(wdDoc.MainDocumentPart.GetStream()); XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager); foreach (XmlNode paragraphNode in paragraphNodes) { XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsManager); foreach (System.Xml.XmlNode textNode in textNodes) { textBuilder.Append(textNode.InnerText); } textBuilder.Append(Environment.NewLine); } } var result = textBuilder.ToString(); return result; }

    异常情况:抛出异常:文件包含损坏的数据

    解决办法:把doc文件转化为docx文件,可搜索在线转化网站。之后就可以使用docx文件。

  • 相关阅读:
    Codeforces D
    Codeforces C
    Minimal Ratio Tree HDU
    Tian Ji -- The Horse Racing HDU
    Monkey Banana Problem LightOJ
    Rooks LightOJ
    洛谷 P2742 [USACO5.1]圈奶牛Fencing the Cows || 凸包模板
    洛谷 P3382 【模板】三分法
    洛谷 P1438 无聊的数列
    洛谷 P1082 同余方程
  • 原文地址:https://www.cnblogs.com/dayang12525/p/8675269.html
Copyright © 2011-2022 走看看