zoukankan      html  css  js  c++  java
  • Word2CHM released

    Introduce

    Word2CHM snapshotWord2CHM is a open source C# program which can convert MS Word document (in 2000/2003 format) to a CHM document. Learn more , visit http://www.sinoreport.net/Word2CHM_Details.aspx .

    This is a screen snapshot.

    Background

    Many people write customer help document with MS Word, because MS Word is very fit to write document include text, images and tables.

    But many customers did not want read help document in MS Word format, but they like CHM format. So it is useful than convert ms word document to CHM document. This is why I build Word2CHM.

    Word2CHM

    In Word2CHM , there are three steps in converting ms word document to CHM document . First is convert ms word document to a single html file, second is split a single html file to multi html files, and thirst is compile multi html files to a single CHM file.

    First, Convert ms word document to a single html file

    MS Word application support OLE automatic technology, a C# program can host a ms word application, open ms word binary document and save as a html file.

     There are some sample C# code that hosts a ms word application.
    private bool SaveWordToHtml(string docFileName, string htmlFileName)
    {
        // check doc file name
        if (System.IO.File.Exists(docFileName) == false )
        {
            this.Alert("File '" + docFileName + "' not exist!");
            return false;
        }
        // check output directory
        string dir = System.IO.Path.GetDirectoryName(htmlFileName);
        if (System.IO.Directory.Exists(dir) == false )
        {
            this.Alert("Directory '" + dir + "' not exist!");
            return false;
        }
        object trueValue = true;
        object falseValue = false;
        object missValue = System.Reflection.Missing.Value;
        object fileNameValue = docFileName;
        // create word application instance
        Microsoft.Office.Interop.Word.Application app =
            new Microsoft.Office.Interop.Word.ApplicationClass();
        // set word application visible
        // if something is error and quit , user can close word application by self.
        app.Visible = true;
        // open document
        Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(
            ref fileNameValue,
            ref missValue,
            ref trueValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue);
        // save a html file
        object htmlFileNameValue = htmlFileName;
        object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatFilteredHTML;
        doc.SaveAs(
            ref htmlFileNameValue ,
            ref format,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue,
            ref missValue);
        // close document and release resource
        doc.Close(ref falseValue, ref missValue, ref missValue);
        app.Quit(ref falseValue, ref missValue, ref missValue);
        System.Runtime.InteropServices.Marshal.ReleaseComObject(doc);
        System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
        return true;
    }

    In this C# source code, it is important than call function ReleaseComObject. Use ReleaseComObject function, program can release all resource use by word application.

    In many program which hosts ms word application( also Excel application ), When program does not need word application, program can call Quit function of word application. But sometimes, The word process still alive, this is lead very serious resource leak. Use ReleaseComObject can reduce this risk.

    Second, Split a single html file to multi html file

    The html file generate word application include all content of word document. For example, A word document contains the following content.

     

    I Save this document as filtered html file, the html file source code as the following.

    <html>

           <head>

                  <meta http-equiv=Content-Type content="text/html; charset=gb2312">

                  <meta name=Generator content="Microsoft Word 11 (filtered)">

                  <title>Header1</title>

                  <style>

                   some style code

                  </style>

           </head>

           <body lang=ZH-CN style='text-justify-trim:punctuation'>

                  <div class=Section1 style='layout-grid">
                         <h1><span lang=EN-US>Header1</span></h1>
                         <p class=MsoNormal><span lang=EN-US>Content1</span></p>
                         <h2><span lang=EN-US>Header2</span></h2>
                         <p class=MsoNormal><span lang=EN-US>Content2</span></p>
                  </div>
           </body>
    </html>

    In this html source code, a div tag include all content, Word2CHM need split this html file to two files.

    File0.html

    <html>
           <head>
                  <meta http-equiv=Content-Type content="text/html; charset=gb2312">
                  <meta name=Generator content="Microsoft Word 11 (filtered)">
                  <title>Header1</title>
           <style>
            --------------
           </style>
           </head>
           <body>
                  <h1>Header</h1><hr />
                  <p class=MsoNormal><span lang=EN-US>Content1</span></p>
                  <hr /><h1>Footer</h1>
           </body>
    </html>

    File1.html

    <html>
           <head>
                  <meta http-equiv=Content-Type content="text/html; charset=gb2312">
                  <meta name=Generator content="Microsoft Word 11 (filtered)">
                  <title>Header1</title>
           <style>
            --------------
           </style>
           </head>
           <body>
                  <h1>Header</h1><hr />
                  <p class=MsoNormal><span lang=EN-US>Content2</span></p>
                  <hr /><h1>Footer</h1>
           </body>
    </html>

    Here , program add html souce “<h1>Header</h1><hr />” in the front of html content source code , and add “<hr /><h1>Footer</h1>” after html content. Those additional html source uses as header and footer.

    In Word2CHMI use the following C# code to split html file.
    string strDir = System.IO.Path.GetDirectoryName(fileName);
    string strHtml = null;
    System.Text.Encoding encoding = System.Text.Encoding.Default ;
    using (StreamReader reader = new StreamReader(fileName, encoding, true))
    {
        //set content encoding
        encoding = reader.CurrentEncoding;
        //read HTML source code
        strHtml = reader.ReadToEnd();
    }
    int index = strHtml.IndexOf("<body");
    string strHeader = strHtml.Substring(0, index);
    string strHeader1 = strHeader;
    string strHeader2 = null;
    index = strHeader.IndexOf("<title>");
    if (index > 0)
    {
        strHeader1 = strHeader.Substring(0, index);
        int indexEndTitle = strHeader.IndexOf("</title>");
        strHeader2 = strHeader.Substring(indexEndTitle + 8);
        // read title
        this.strTitle = strHeader.Substring(index + 7, indexEndTitle - index - 6 - 1);
    }
    else
    {
        strTitle = System.IO.Path.GetFileNameWithoutExtension(fileName);
    }
    index = strHtml.IndexOf(">", index);
    string strBody = strHtml.Substring(index + 1);
    index = strBody.LastIndexOf("</body>");
    strBody = strBody.Substring(0, index);
    index = strBody.IndexOf("<div");
    if (index >= 0)
    {
        index = strBody.IndexOf(">", index+1);
        strBody = strBody.Substring(index + 1 );
        index = strBody.LastIndexOf("</div>");
        strBody = strBody.Substring(0, index);
    }
    //Split html document by tag <h>
    index = strBody.IndexOf("<h");
    if (index >= 0)
    {
        strBody = strBody.Substring(index);
    }
    else
    {
        strBody = "";
    }
    strBody = strBody.Trim();
    int lastLevel = 1;
    int lastNativeLevel = 1;
    while (strBody.Length > 0)
    {
        int Nativelevel = Convert.ToInt32(strBody.Substring(2, 1));
        int level = Nativelevel;
        if (lastNativeLevel == Nativelevel)
        {
            level = lastLevel;
        }
        else
        {
            if (level > lastLevel + 1)
            {
                level = lastLevel + 1;
            }
        }
        lastNativeLevel = Nativelevel;
        lastLevel = level;
        int index2 = strBody.IndexOf(">");
        int index3 = strBody.IndexOf("</h" + Nativelevel + ">");
        //read text in <h</h> as topic title
        string strTitle = strBody.Substring(index2 + 1, index3 - index2 - 1);
        while (strTitle.IndexOf("<") >= 0)
        {
            int index4 = strTitle.IndexOf("<");
            int index5 = strTitle.IndexOf(">", index4);
            strTitle = strTitle.Remove(index4, index5 - index4 + 1);
        }
        strBody = strBody.Substring(index3 + 5);
        index = strBody.IndexOf("<h");
        if (index == -1)
        {
            index = strBody.Length;
        }
        //read topic content
        string strContent = strBody.Substring(0, index);
        // add node to chm document DOM tree
        CHMNode currentNode = null;
        if (this.Nodes.Count == 0 || level == 1)
        {
            //create node
            currentNode = new CHMNode();
            this.Nodes.Add(currentNode);
        }
        else
        {
            CHMNode parentNode = this.Nodes.LastNode;
            while (true)
            {
                if (parentNode.Nodes.Count == 0)
                    break;
                if (parentNode.Level == level - 1)
                {
                    break;
                }
                parentNode = parentNode.Nodes.LastNode;
            }
            currentNode = new CHMNode();
            //add child node
            parentNode.Nodes.Add(currentNode);
        }
        //set node's name
        currentNode.Name = strTitle;
        strContent = strContent.Trim();
        if (strContent.Length > 0)
        {
            string strHtmlFileName = "";
            CHMNode node = currentNode;
            while (node != null)
            {
                int NodeIndex = node.Index;
                if (node.Parent == null)
                    NodeIndex = this.Nodes.IndexOf(node);
                if (strHtmlFileName.Length > 0)
                    strHtmlFileName = NodeIndex + "-" + strHtmlFileName;
                else
                    strHtmlFileName = NodeIndex.ToString();
                node = node.Parent;
            }
            strHtmlFileName = "File" + strHtmlFileName + ".html";
            currentNode.Local = strHtmlFileName;
            myFiles.Add(strHtmlFileName);
            strHtmlFileName = System.IO.Path.Combine(strDir, strHtmlFileName);
            //Generate topic html file
            using (StreamWriter writer = new StreamWriter(strHtmlFileName, false, encoding))
            {
                if (strHeader2 != null)
                {
                    //write header html source
                    writer.Write(strHeader1);
                    writer.Write("<title>" + strTitle + "</title>");
                    writer.Write(strHeader2);
                }
                else
                {
                    writer.Write(strHeader);
                }
                writer.WriteLine("<body style=' margin: 0px 0px 0px 0px; padding: 0px 0px 0px 0px;font-family: Verdana, Arial, Helvetica, sans-serif;' >");
                string header = this.HelpHeaderHtml;
                if (header != null)
                {
                    //write header html source code
                    header = header.Replace("@Title", strTitle);
                    writer.WriteLine(header);
                }
                //write html content
                writer.WriteLine(strContent);
                //write footer html source
                writer.WriteLine(this.HelpFooterHtml);
                writer.WriteLine("</body>");
                writer.WriteLine("</html>");
            }
        }
        if (index == strBody.Length)
        {
            break;
        }
        else
        {
            strBody = strBody.Substring(index);
        }
    }//while
    //write html file
    string strFilesDir = System.IO.Path.ChangeExtension(fileName, "files");
    if (System.IO.Directory.Exists(strFilesDir))
    {
        string dirName = System.IO.Path.GetFileName(strFilesDir);
        foreach (string name in System.IO.Directory.GetFiles(strFilesDir))
        {
            string name2 = System.IO.Path.GetFileName(name);
            name2 = System.IO.Path.Combine(dirName, name2);
            myFiles.Add(name2);
        }

    }

    Use this C# code, I split html file by use html tag H1,H2,H3 and Hn.And set each html document’s title as content between html tag Hn.

     
     

    Third. Compile multi html files to a single CHM file

    Word2CHM can not compile multi html file to a single CHM file by it self,  It call “HTML Help workshop” to generate CHM file.

    HTML Help workshop is a product of Microsoft, It can compile multi html file to a CHM file, It save settings in a help project file which extend name is hhp.

    In Word2CHM , program generate HHP file , It use the following C# source code.
    strOutputText = "";
    if (System.IO.File.Exists(compilerExeFileName) == false)
    {
        throw new System.IO.FileNotFoundException(compilerExeFileName);
    }
    string strHHP = System.IO.Path.Combine(this.WorkDirectory, strName + ".hhp");
    string strHHC = System.IO.Path.Combine(this.WorkDirectory, strName + ".hhc");
    string strCHM = System.IO.Path.Combine(this.WorkDirectory, strName + ".chm");
    if (System.IO.File.Exists(strCHM))
    {
        System.IO.File.Delete(strCHM);
    }
    string DefaultTopic = null;
    CHMNodeList nodes = this.GetAllNodes();
    foreach (CHMNode node in nodes)
    {
        if (HasContent(node.Local))
        {
            DefaultTopic = node.Local;
            break;
        }
    }
    // Generate hhp file
    using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter(
               strHHP,
               false,
               System.Text.Encoding.GetEncoding(936)))
    {
        myWriter.WriteLine("[OPTIONS]");
        myWriter.WriteLine("Compiled file=" + System.IO.Path.GetFileName(strCHM));
        myWriter.WriteLine("Contents file=" + System.IO.Path.GetFileName(strHHC));
        myWriter.WriteLine("Default topic=" + this.DefaultTopic);
        myWriter.WriteLine("Default Window=main");
        myWriter.WriteLine("Display compile progress=yes");
        myWriter.WriteLine("Full-text search=" + (this.FullTextSearch ? "Yes" : "No"));
        myWriter.WriteLine("Binary TOC=" + (this.BinaryToc ? "Yes" : "No"));
        myWriter.WriteLine("Auto Index=" + (this.AutoIndex ? "Yes" : "No"));
        myWriter.WriteLine("Binary Index=" + (this.BinaryIndex ? "Yes" : "No"));
        //myWriter.WriteLine("Index file=" + System.IO.Path.GetFileName( strIndexFile ));
        myWriter.WriteLine("Title=" + this.Title);
        myWriter.WriteLine("[FILES]");
        foreach (CHMNode node in nodes)
        {
            if (HasContent(node.Local))
            {
                if (myFiles.Contains(node.Local) == false)
                {
                    myFiles.Add(node.Local);
                }
            }
        }
        foreach (string fileName in myFiles)
        {
            myWriter.WriteLine(fileName);
        }
    }
    // Generate hhc file
    System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
    doc.AppendChild(doc.CreateElement("hhc"));
    ToHHCXMLElement(this.myNodes, doc.DocumentElement);
    using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter(
               strHHC,
               false,
               System.Text.Encoding.GetEncoding(936)))
    {
        myWriter.Write(doc.DocumentElement.InnerXml);
    }
    // Compile project , generate chm file
    ProcessStartInfo start = new ProcessStartInfo(compilerExeFileName, "\"" + strHHP + "\"");
    start.UseShellExecute = false;
    start.CreateNoWindow = true;
    start.RedirectStandardOutput = true;
    start.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
    System.Diagnostics.Process proc = System.Diagnostics.Process.Start(start);
    proc.PriorityClass = System.Diagnostics.ProcessPriorityClass.BelowNormal;
    this.strOutputText = proc.StandardOutput.ReadToEnd();
    // Delete template file
    if (deleteTempFile)
    {
        System.IO.File.Delete(strHHP);
        System.IO.File.Delete(strHHC);
    }
    if (System.IO.File.Exists(strCHM))
        return strCHM;
    else
    return null;

    After generate HHP file , Word2CHM use the following C# code to generate CHM file.

    string hhcPath = Word2CHM.Properties.Settings.Default.HHCExePath;
    if( System.IO.File.Exists( hhcPath ) == false )
    {
        MessageBox.Show("Can not find execute file '"

            + hhcPath + "' of 'HTML Help Workshop'!");
        return;
    }
    try
    {
        string name = System.IO.Path.ChangeExtension(
            this.myDocument.FileName , "hhp");
        this.Cursor = System.Windows.Forms.Cursors.WaitCursor;
        name = myDocument.CompileProject(
            hhcPath ,
            Word2CHM.Properties.Settings.Default.DeleteTempFile );
        this.Cursor = System.Windows.Forms.Cursors.Default;
        System.Diagnostics.Debug.WriteLine( myDocument.OutputText);
        if (name == null)
            Alert( "Compile error!");
        else
            Alert( "Genereate file " + name);
    }
    catch (Exception ext)
    {
        Alert("App error:" + ext.Message);
    }

    After complete this three steps , Word2CHM can convert a Word document to a CHM file.

  • 相关阅读:
    相对定位
    51nod三大经典博弈(模板)
    51nod1306斐波那契公约数(数论推公式,矩阵快速幂优化递推序列)
    洛谷P1313计算系数(数学二项式次方展开定理,快速幂,除法取模逆元)
    hduoj1052田忌赛马(贪心好题略难,思维,模拟)
    洛谷P1134阶乘问题(数论,末尾0的个数变形,思维转换)
    洛谷P又是毕业季2(数学数论,找规律,公约数)
    洛谷p1309瑞士轮(好题,理解归并排序本质)
    洛谷p1582倒水(思维好题,数学,2进制问题,代码实现)
    洛谷p1338末日的传说(思维好题,数学)
  • 原文地址:https://www.cnblogs.com/xdesigner/p/1877785.html
Copyright © 2011-2022 走看看