Word2CHM is a open source C# program which can convert MS Word document (in 2000/2003 format) to a CHM document. Learn more , visit .
This is a screen snapshot.
Many people write customer help document with MS Word, because MS Word is very fit to write document include text, images and tables.
But many customers did not want read help document in MS Word format, but they like CHM format. So it is useful than convert ms word document to CHM document. This is why I build Word2CHM.
In Word2CHM , there are three steps in converting ms word document to CHM document . First is convert ms word document to a single html file, second is split a single html file to multi html files, and thirst is compile multi html files to a single CHM file.
First, Convert ms word document to a single html file
MS Word application support OLE automatic technology, a C# program can host a ms word application, open ms word binary document and save as a html file.
There are some sample C# code that hosts a ms word application.
private bool SaveWordToHtml(string docFileName, string htmlFileName)
// check doc file name
if (System.IO.File.Exists(docFileName) == false )
this.Alert("File '" + docFileName + "' not exist!");
return false;
// check output directory
string dir = System.IO.Path.GetDirectoryName(htmlFileName);
if (System.IO.Directory.Exists(dir) == false )
this.Alert("Directory '" + dir + "' not exist!");
return false;
object trueValue = true;
object falseValue = false;
object missValue = System.Reflection.Missing.Value;
object fileNameValue = docFileName;
// create word application instance
Microsoft.Office.Interop.Word.Application app =
new Microsoft.Office.Interop.Word.ApplicationClass();
// set word application visible
// if something is error and quit , user can close word application by self.
app.Visible = true;
// open document
Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(
ref fileNameValue,
ref missValue,
ref trueValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue);
// save a html file
object htmlFileNameValue = htmlFileName;
object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatFilteredHTML;
ref htmlFileNameValue ,
ref format,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue,
ref missValue);
// close document and release resource
doc.Close(ref falseValue, ref missValue, ref missValue);
app.Quit(ref falseValue, ref missValue, ref missValue);
return true;
In this C# source code, it is important than call function ReleaseComObject. Use ReleaseComObject function, program can release all resource use by word application.
In many program which hosts ms word application( also Excel application ), When program does not need word application, program can call Quit function of word application. But sometimes, The word process still alive, this is lead very serious resource leak. Use ReleaseComObject can reduce this risk.
Second, Split a single html file to multi html file
The html file generate word application include all content of word document. For example, A word document contains the following content.
I Save this document as filtered html file, the html file source code as the following.
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<meta name=Generator content="Microsoft Word 11 (filtered)">
some style code
<body lang=ZH-CN style='text-justify-trim:punctuation'>
<div class=Section1 style='layout-grid">
<h1><span lang=EN-US>Header1</span></h1>
<p class=MsoNormal><span lang=EN-US>Content1</span></p>
<h2><span lang=EN-US>Header2</span></h2>
<p class=MsoNormal><span lang=EN-US>Content2</span></p>
In this html source code, a div tag include all content, Word2CHM need split this html file to two files.
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<meta name=Generator content="Microsoft Word 11 (filtered)">
<h1>Header</h1><hr />
<p class=MsoNormal><span lang=EN-US>Content1</span></p>
<hr /><h1>Footer</h1>
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<meta name=Generator content="Microsoft Word 11 (filtered)">
<h1>Header</h1><hr />
<p class=MsoNormal><span lang=EN-US>Content2</span></p>
<hr /><h1>Footer</h1>
Here , program add html souce “<h1>Header</h1><hr />” in the front of html content source code , and add “<hr /><h1>Footer</h1>” after html content. Those additional html source uses as header and footer.
In Word2CHM,I use the following C# code to split html file.
string strDir = System.IO.Path.GetDirectoryName(fileName);
string strHtml = null;
System.Text.Encoding encoding = System.Text.Encoding.Default ;
using (StreamReader reader = new StreamReader(fileName, encoding, true))
//set content encoding
encoding = reader.CurrentEncoding;
//read HTML source code
strHtml = reader.ReadToEnd();
int index = strHtml.IndexOf("<body");
string strHeader = strHtml.Substring(0, index);
string strHeader1 = strHeader;
string strHeader2 = null;
index = strHeader.IndexOf("<title>");
if (index > 0)
strHeader1 = strHeader.Substring(0, index);
int indexEndTitle = strHeader.IndexOf("</title>");
strHeader2 = strHeader.Substring(indexEndTitle + 8);
// read title
this.strTitle = strHeader.Substring(index + 7, indexEndTitle - index - 6 - 1);
strTitle = System.IO.Path.GetFileNameWithoutExtension(fileName);
index = strHtml.IndexOf(">", index);
string strBody = strHtml.Substring(index + 1);
index = strBody.LastIndexOf("</body>");
strBody = strBody.Substring(0, index);
index = strBody.IndexOf("<div");
if (index >= 0)
index = strBody.IndexOf(">", index+1);
strBody = strBody.Substring(index + 1 );
index = strBody.LastIndexOf("</div>");
strBody = strBody.Substring(0, index);
//Split html document by tag <h>
index = strBody.IndexOf("<h");
if (index >= 0)
strBody = strBody.Substring(index);
strBody = "";
strBody = strBody.Trim();
int lastLevel = 1;
int lastNativeLevel = 1;
while (strBody.Length > 0)
int Nativelevel = Convert.ToInt32(strBody.Substring(2, 1));
int level = Nativelevel;
if (lastNativeLevel == Nativelevel)
level = lastLevel;
if (level > lastLevel + 1)
level = lastLevel + 1;
lastNativeLevel = Nativelevel;
lastLevel = level;
int index2 = strBody.IndexOf(">");
int index3 = strBody.IndexOf("</h" + Nativelevel + ">");
//read text in <h</h> as topic title
string strTitle = strBody.Substring(index2 + 1, index3 - index2 - 1);
while (strTitle.IndexOf("<") >= 0)
int index4 = strTitle.IndexOf("<");
int index5 = strTitle.IndexOf(">", index4);
strTitle = strTitle.Remove(index4, index5 - index4 + 1);
strBody = strBody.Substring(index3 + 5);
index = strBody.IndexOf("<h");
if (index == -1)
index = strBody.Length;
//read topic content
string strContent = strBody.Substring(0, index);
// add node to chm document DOM tree
CHMNode currentNode = null;
if (this.Nodes.Count == 0 || level == 1)
//create node
currentNode = new CHMNode();
CHMNode parentNode = this.Nodes.LastNode;
while (true)
if (parentNode.Nodes.Count == 0)
if (parentNode.Level == level - 1)
parentNode = parentNode.Nodes.LastNode;
currentNode = new CHMNode();
//add child node
//set node's name
currentNode.Name = strTitle;
strContent = strContent.Trim();
if (strContent.Length > 0)
string strHtmlFileName = "";
CHMNode node = currentNode;
while (node != null)
int NodeIndex = node.Index;
if (node.Parent == null)
NodeIndex = this.Nodes.IndexOf(node);
if (strHtmlFileName.Length > 0)
strHtmlFileName = NodeIndex + "-" + strHtmlFileName;
strHtmlFileName = NodeIndex.ToString();
node = node.Parent;
strHtmlFileName = "File" + strHtmlFileName + ".html";
currentNode.Local = strHtmlFileName;
strHtmlFileName = System.IO.Path.Combine(strDir, strHtmlFileName);
//Generate topic html file
using (StreamWriter writer = new StreamWriter(strHtmlFileName, false, encoding))
if (strHeader2 != null)
//write header html source
writer.Write("<title>" + strTitle + "</title>");
writer.WriteLine("<body style=' margin: 0px 0px 0px 0px; padding: 0px 0px 0px 0px;font-family: Verdana, Arial, Helvetica, sans-serif;' >");
string header = this.HelpHeaderHtml;
if (header != null)
//write header html source code
header = header.Replace("@Title", strTitle);
//write html content
//write footer html source
if (index == strBody.Length)
strBody = strBody.Substring(index);
//write html file
string strFilesDir = System.IO.Path.ChangeExtension(fileName, "files");
if (System.IO.Directory.Exists(strFilesDir))
string dirName = System.IO.Path.GetFileName(strFilesDir);
foreach (string name in System.IO.Directory.GetFiles(strFilesDir))
string name2 = System.IO.Path.GetFileName(name);
name2 = System.IO.Path.Combine(dirName, name2);
Use this C# code, I split html file by use html tag H1,H2,H3 and Hn.And set each html document’s title as content between html tag Hn.
Third. Compile multi html files to a single CHM file
Word2CHM can not compile multi html file to a single CHM file by it self, It call “HTML Help workshop” to generate CHM file.
HTML Help workshop is a product of Microsoft, It can compile multi html file to a CHM file, It save settings in a help project file which extend name is hhp.
In Word2CHM , program generate HHP file , It use the following C# source code.
strOutputText = "";
if (System.IO.File.Exists(compilerExeFileName) == false)
throw new System.IO.FileNotFoundException(compilerExeFileName);
string strHHP = System.IO.Path.Combine(this.WorkDirectory, strName + ".hhp");
string strHHC = System.IO.Path.Combine(this.WorkDirectory, strName + ".hhc");
string strCHM = System.IO.Path.Combine(this.WorkDirectory, strName + ".chm");
if (System.IO.File.Exists(strCHM))
string DefaultTopic = null;
CHMNodeList nodes = this.GetAllNodes();
foreach (CHMNode node in nodes)
if (HasContent(node.Local))
DefaultTopic = node.Local;
// Generate hhp file
using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter(
myWriter.WriteLine("Compiled file=" + System.IO.Path.GetFileName(strCHM));
myWriter.WriteLine("Contents file=" + System.IO.Path.GetFileName(strHHC));
myWriter.WriteLine("Default topic=" + this.DefaultTopic);
myWriter.WriteLine("Default Window=main");
myWriter.WriteLine("Display compile progress=yes");
myWriter.WriteLine("Full-text search=" + (this.FullTextSearch ? "Yes" : "No"));
myWriter.WriteLine("Binary TOC=" + (this.BinaryToc ? "Yes" : "No"));
myWriter.WriteLine("Auto Index=" + (this.AutoIndex ? "Yes" : "No"));
myWriter.WriteLine("Binary Index=" + (this.BinaryIndex ? "Yes" : "No"));
//myWriter.WriteLine("Index file=" + System.IO.Path.GetFileName( strIndexFile ));
myWriter.WriteLine("Title=" + this.Title);
foreach (CHMNode node in nodes)
if (HasContent(node.Local))
if (myFiles.Contains(node.Local) == false)
foreach (string fileName in myFiles)
// Generate hhc file
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
ToHHCXMLElement(this.myNodes, doc.DocumentElement);
using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter(
// Compile project , generate chm file
ProcessStartInfo start = new ProcessStartInfo(compilerExeFileName, "\"" + strHHP + "\"");
start.UseShellExecute = false;
start.CreateNoWindow = true;
start.RedirectStandardOutput = true;
start.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
System.Diagnostics.Process proc = System.Diagnostics.Process.Start(start);
proc.PriorityClass = System.Diagnostics.ProcessPriorityClass.BelowNormal;
this.strOutputText = proc.StandardOutput.ReadToEnd();
// Delete template file
if (deleteTempFile)
if (System.IO.File.Exists(strCHM))
return strCHM;
return null;
After generate HHP file , Word2CHM use the following C# code to generate CHM file.
string hhcPath = Word2CHM.Properties.Settings.Default.HHCExePath;
if( System.IO.File.Exists( hhcPath ) == false )
MessageBox.Show("Can not find execute file '"
+ hhcPath + "' of 'HTML Help Workshop'!");
string name = System.IO.Path.ChangeExtension(
this.myDocument.FileName , "hhp");
this.Cursor = System.Windows.Forms.Cursors.WaitCursor;
name = myDocument.CompileProject(
hhcPath ,
Word2CHM.Properties.Settings.Default.DeleteTempFile );
this.Cursor = System.Windows.Forms.Cursors.Default;
System.Diagnostics.Debug.WriteLine( myDocument.OutputText);
if (name == null)
Alert( "Compile error!");
Alert( "Genereate file " + name);
catch (Exception ext)
Alert("App error:" + ext.Message);
After complete this three steps , Word2CHM can convert a Word document to a CHM file.