使用C#转移WORDPRESS博客到为知笔记或EVERNOTE
注:文中的方法非常简单,使用MySQL导出表到XML,发送邮件到你的为知/Evernote邮箱账户即可,唯一目前没找到破的是SMTP常随机无响应.. 而且象GMAIL的SMTP有发送次数限制(一天一千左右吧,所以如果有超过的话可能需要更换账号再试,当然你也可以改良下,做个数组,遇到配额错误可以提取下一个SMTP信息),优点是利用邮件的方式可以将博客中的图片也能导入到笔记中(而不是用的图片原链接,是直接放进笔记)
日志导入Wiz的结果:
一、目的
很简单,我担心博客放在网上有一天会不见了,所以一直想存到本地,博客太多,手动已不可能
- 我这些年使用wordpress有上千篇博客,其中有8成都是私有,使用现在的一些读取FEED方式导出的方式不能读取私有和保护博客,Wordpress也有导出插件,我没有试太多,有一个导出xml的工具采用流式试过十多次没有一次是完整导出,有时导出100来K,有时几M,最后放弃使用WP插件导出。
- 数据源:使用MySQL导出XML。
首先原因是空间提供商不提供MySQL远程连接,(如果你能导出SQL在本地建MySQL再导入,就不在这个话题范围内)。我原本想导出为SQL,再转换为MSSQL,可在网上找的所有converter都是必须登陆远程MySQL服务器,这条路径也行不通。导出的CSV为乱码,试过UTF8, GB2312都不行,这条路径也行不通。最后发现只有导出XML时没有乱码。这就简单多了。 - 使用C#读取XML时,有几类重要的信息是个人觉得必需的
- 博客表 wp_posts
- 类别(category / tag等) 表 wp_term_taxonomy, wp_terms wp_term_relationships
SQL:
-- 列出全部日志 --
create view v_post as
SELECT p.*, tax.term_taxonomy_id,tax.term_id,category.name, tax.count
FROM
wp_term_relationships relation,
wp_terms category,
wp_term_taxonomy tax,
wp_posts p
WHERE
category.term_id = tax.term_id and
tax.term_taxonomy_id = relation.term_taxonomy_id and
relation.object_id = p.id
二、操作步骤
- 登陆你空间的phpAdmin,选中日志数据库后,选择Export,框选的是需要注意的,导出时不要选择导出TABLE/VIEW等SQL语句。UTF-8必选否则会有乱码
得到导出的XML将作为我们的数据源。请一定确保下载下来的XML是有效的(我刚试过,这一种也有可能会下载不完整,无解 /摊手) - 跑日志的运行结果

- 代码
View CodeusingSystem;usingSystem.Linq;usingSystem.Net.Mail;usingSystem.Text;usingSystem.Text.RegularExpressions;usingSystem.Threading.Tasks;usingSystem.Xml.Linq;usingSystem.Xml.XPath;namespaceWordpressExport{classProgram{staticSmtpClient smtpClient;staticMailMessage mailMessage;staticIOrderedEnumerable<post> list;staticboolsmtpConnected;staticbooltriedAfterEx =false;readonlystaticstringCONFIG_smtp_add = System.Configuration.ConfigurationSettings.AppSettings["smtp_add"];readonlystaticstringCONFIG_smtp_acct_name = System.Configuration.ConfigurationSettings.AppSettings["smtp_acct_name"];readonlystaticstringCONFIG_smtp_acct_pwd = System.Configuration.ConfigurationSettings.AppSettings["smtp_acct_pwd"];readonlystaticstringCONFIG_xml_path = System.Configuration.ConfigurationSettings.AppSettings["xml_path"];readonlystaticstringCONFIG_evernote_folder = System.Configuration.ConfigurationSettings.AppSettings["evernote_folder"];readonlystaticstringCONFIG_post_scope_start_date = System.Configuration.ConfigurationSettings.AppSettings["post_scope_start_date"];readonlystaticstringCONFIG_post_scope_end_date = System.Configuration.ConfigurationSettings.AppSettings["post_scope_end_date"];readonlystaticstringCONFIG_notebook_email = System.Configuration.ConfigurationSettings.AppSettings["notebook_email"];readonlystaticstringCONFIG_blog_addr = System.Configuration.ConfigurationSettings.AppSettings["blog_addr"];staticvoidSendMail(post post){boolisSuccess =true;if(post ==null)return;DateTime post_date;if(!DateTime.TryParse(post.post_date,outpost_date)){post_date = System.DateTime.MinValue;}stringstr_post_date = post_date == System.DateTime.MinValue ?"": post_date.ToString("yyyy-MM-dd");mailMessage.SubjectEncoding = Encoding.UTF8;mailMessage.Subject =string.Format("[{0}] {1} {2}", str_post_date, post.post_title, CONFIG_evernote_folder);//主題mailMessage.Body ="<b>创建时间:</b>"+ post.post_date +"<br/>";//內容mailMessage.Body +="<b>原目录或tag</b>:"+ post.post_tagcat +"<br/>";//內容mailMessage.Body +=string.Format("<b>原文见</b>:<a href={0}?p={1}>{0}?p={1}</a></a><br/><br/><br/>", CONFIG_blog_addr, post.ID);//內容mailMessage.Body += post.post_content;try{mailMessage.BodyEncoding = Encoding.UTF8;//正文編碼mailMessage.Priority = MailPriority.High;//優先級mailMessage.IsBodyHtml =true;Regex reg =newRegex(@"\n");mailMessage.Body = reg.Replace(mailMessage.Body,"<br/>");Console.WriteLine(System.DateTime.Now +" sending mail... id = "+ post.ID +" "+ post.post_title);// 你也可以使用异步发送,不过会导致发到邮箱的时间很乱,我们还是希望导入到笔记后的笔记自然顺序(即ID)是按时间顺序的// smtpClient.Send(mailMessage);// 發送郵件}catch(SmtpException ex){Console.WriteLine("failed in SMTP connection, try again...");// System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the// software in your host machine.// ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machineif(triedAfterEx){isSuccess =false;Console.WriteLine(ex.Message);}// again, create another smtp instance for reconnecting SMTP// Parallel.Invoke(() => connectSmtp(), () => SendMail (post));isSuccess =false;Console.WriteLine(ex.Message);triedAfterEx =true;}catch(Exception ex2){isSuccess =false;Console.WriteLine(ex2.Message);}Console.WriteLine(System.DateTime.Now.ToString() + (isSuccess ?" completed":" completed with error(s)"));}staticvoidMain(string[] args){smtpConnected =false;Console.WriteLine(System.DateTime.Now +" start...");Parallel.Invoke(() => connectSmtp(), () => HandleWPExport());if(list.Count() == 0){thrownewException("ooops..");}intcnt = 0;while(1 == 1){if(smtpConnected =true|| cnt > 10000)break;cnt++;}Console.WriteLine(cnt);foreach(var pinlist){triedAfterEx =false;SendMail(p);}}privatestaticvoidconnectSmtp(){smtpClient =newSmtpClient();smtpClient.DeliveryMethod = SmtpDeliveryMethod.Network;//指定電子郵件發送方式smtpClient.Host ="smtp.gmail.com";//指定SMTP服務器smtpClient.EnableSsl =true;smtpClient.Credentials =newSystem.Net.NetworkCredential(CONFIG_smtp_acct_name, CONFIG_smtp_acct_pwd);smtpClient.Timeout = 100000;}privatestaticvoidHandleWPExport(){Console.WriteLine(System.DateTime.Now +" starting to read wp_posts.xml...");XDocument xmlDoc = XDocument.Load(CONFIG_xml_path);var queryTax = (from taxinxmlDoc.XPathSelectElements(@".//table[@name='wp_term_taxonomy']")wheretax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.category.ToString() ||tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.link_category.ToString() ||tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.post_tag.ToString()selectnew{term_id = tax.XPathSelectElement("column[@name='term_id']").Value.Trim("\n".ToCharArray()),tax_id = tax.XPathSelectElement("column[@name='term_taxonomy_id']").Value.Trim("\n".ToCharArray())}).ToList();var queryCat = (from catinxmlDoc.XPathSelectElements(@".//table[@name='wp_terms']")selectnew{term_id = cat.XPathSelectElement("column[@name='term_id']").Value.Trim("\n".ToCharArray()),name = cat.XPathSelectElement("column[@name='name']").Value.Trim("\n".ToCharArray()),}).ToList();var queryRel = (from relinxmlDoc.XPathSelectElements(@".//table[@name='wp_term_relationships']")selectnew{object_id = rel.XPathSelectElement("column[@name='object_id']").Value.Trim("\n".ToCharArray()),tax_id = rel.XPathSelectElement("column[@name='term_taxonomy_id']").Value.Trim("\n".ToCharArray()),}).ToList();Console.WriteLine(System.DateTime.Now +" continuing ... ");var queryTagCat = (from taxinqueryTaxfrom catinqueryCatfrom relinqueryRelwhere cat.term_id == tax.term_id && tax.tax_id == rel.tax_idselectnew{name = cat.name,id = rel.object_id}).ToList();var query = from pinxmlDoc.XPathSelectElements(@".//table[@name='wp_posts' ]")where p.XPathSelectElement("column[@name='post_type']").Value.Trim("\n".ToCharArray()) =="post"// there're two types - post, attachment, we don't want attachment type of postsselectnewpost{ID = p.XPathSelectElement("column[@name='ID']").Value.Trim("\n".ToCharArray()),post_author = p.XPathSelectElement("column[@name='post_author']").Value.Trim("\n".ToCharArray()),post_date = p.XPathSelectElement("column[@name='post_date']").Value.Trim("\n".ToCharArray()),post_date_gmt = p.XPathSelectElement("column[@name='post_date_gmt']").Value.Trim("\n".ToCharArray()),post_content = p.XPathSelectElement("column[@name='post_content']").Value.Trim("\n".ToCharArray()),post_title = p.XPathSelectElement("column[@name='post_title']").Value.Trim("\n".ToCharArray()),post_excerpt = p.XPathSelectElement("column[@name='post_excerpt']").Value.Trim("\n".ToCharArray()),post_status = p.XPathSelectElement("column[@name='post_status']").Value.Trim("\n".ToCharArray()),comment_status = p.XPathSelectElement("column[@name='comment_status']").Value.Trim("\n".ToCharArray()),ping_status = p.XPathSelectElement("column[@name='ping_status']").Value.Trim("\n".ToCharArray()),post_password = p.XPathSelectElement("column[@name='post_password']").Value.Trim("\n".ToCharArray()),post_name = p.XPathSelectElement("column[@name='post_name']").Value.Trim("\n".ToCharArray()),to_ping = p.XPathSelectElement("column[@name='to_ping']").Value.Trim("\n".ToCharArray()),pinged = p.XPathSelectElement("column[@name='pinged']").Value.Trim("\n".ToCharArray()),post_modified = p.XPathSelectElement("column[@name='post_modified']").Value.Trim("\n".ToCharArray()),post_modified_gmt = p.XPathSelectElement("column[@name='post_modified_gmt']").Value.Trim("\n".ToCharArray()),post_content_filtered = p.XPathSelectElement("column[@name='post_content_filtered']").Value.Trim("\n".ToCharArray()),post_parent = p.XPathSelectElement("column[@name='post_parent']").Value.Trim("\n".ToCharArray()),guid = p.XPathSelectElement("column[@name='guid']").Value.Trim("\n".ToCharArray()),menu_order = p.XPathSelectElement("column[@name='menu_order']").Value.Trim("\n".ToCharArray()),post_type = p.XPathSelectElement("column[@name='post_type']").Value.Trim("\n".ToCharArray()),post_mime_type = p.XPathSelectElement("column[@name='post_mime_type']").Value.Trim("\n".ToCharArray()),comment_count = p.XPathSelectElement("column[@name='comment_count']").Value.Trim("\n".ToCharArray()),post_tagcat =string.Join(" ", (from tinqueryTagCat where t.id == p.XPathSelectElement("column[@name='ID']").Value.Trim("\n".ToCharArray()) select t.name).ToList())};Console.WriteLine(System.DateTime.Now +" done with reading the xml... ");// PredicateBuilderif(!string.IsNullOrEmpty(CONFIG_post_scope_start_date))query = query.Where(o => Convert.ToDateTime(o.post_date) >= DateTime.Parse(CONFIG_post_scope_start_date));if(!string.IsNullOrEmpty(CONFIG_post_scope_end_date))query = query.Where(o => Convert.ToDateTime(o.post_date) <= DateTime.Parse(CONFIG_post_scope_end_date));list = query.ToList().OrderBy(o => o.post_date);}}}
DTO类app.configusingSystem;usingSystem.Collections.Generic;usingSystem.Linq;usingSystem.Text;usingSystem.Xml.Serialization;namespaceWordpressExport{[Serializable]publicclasscolumn{publicstringname {get;set; }publicstringtext {get;set; }}[Serializable]publicclassdatabase{[XmlElement(ElementName ="database")]publicList<table> tables {get;set; }}[Serializable]publicclasstable{publicstringname {get;set; }publiccolumn column {get;set; }}[Serializable]publicclasspost{publicstringID {get;set; }publicstringpost_author {get;set; }publicstringpost_date {get;set; }publicstringpost_date_gmt {get;set; }publicstringpost_content {get;set; }publicstringpost_title {get;set; }publicstringpost_excerpt {get;set; }publicstringpost_status {get;set; }publicstringcomment_status {get;set; }publicstringping_status {get;set; }publicstringpost_password {get;set; }publicstringpost_name {get;set; }publicstringto_ping {get;set; }publicstringpinged {get;set; }publicstringpost_modified {get;set; }publicstringpost_modified_gmt {get;set; }publicstringpost_content_filtered {get;set; }publicstringpost_parent {get;set; }publicstringguid {get;set; }publicstringmenu_order {get;set; }publicstringpost_type {get;set; }publicstringpost_mime_type {get;set; }publicstringcomment_count {get;set; }publicstringpost_tagcat {get;set; }}publicenumTaxonomyEnum{category,link_category,post_format,post_tag,series}}<?xmlversion="1.0"?><configuration><startup><supportedRuntimeversion="v4.0"sku=".NETFramework,Version=v4.0"/></startup><appSettings><addkey="smtp_addr"value="smtp.gmail.com"/><addkey="smtp_acct_name"value="@gmail.com"/><addkey="smtp_acct_pwd"value=""/><addkey="evernote_folder"value="@folder..."/><addkey="xml_path"value="xxxxx\wp_posts.xml"/><addkey="notebook_email"value="xxxx@mywiz.cn"/><addkey="post_scope_start_date"value="2013-04-11 22:25:50"/><addkey="post_scope_end_date"value=""/></appSettings></configuration>
三、可改善的地方
- Mail body顶部,我加入了日志详情

- 我没有将日志放进tag,你可以改一下代码使之subject上加入tag (for evernote, 加入#tag1 tag2等,自行查下规则)
- 唯一目前没找到破的是SMTP常随机无响应.. 你如果找到可以破的(比如CATCH到IOExeption重连的),麻烦告诉我
- GMAIL的SMTP有发送次数限制(一天一千左右吧,所以如果有超过的话可能需要更换账号再试,当然你也可以改良下,做个数组,遇到配额错误可以提取下一个SMTP信息)
四、关于为知WIZ
- 个人觉得Wiz是个还有很多提升空间的、自定义性比较强的国产笔记类软件,记得用Wiz的话一定要注意给你不愿意别人看到的条目加个密,不然按Wiz的目录式的HTML文件在电脑中完全没有秘密可言,知道目录的无需登陆就能打加所有日志。

- 没有Wiz账号的,在这顺便递送个邀请码吧,6d485186 (http://www.wiz.cn/i/6d485186),据说用了注册后第一个月有VIP试用,当然你不用也行。
- 比较起Evernote来说,好处就是日志的空间比较大,插件化让它比较有趣味。 比如它可以导出CHM,可以把以往的日志全部转成一个漂亮的CHM。它的搜索体验我觉得比Evernote更好,全文检索的index使得搜索特别快。再多的就不介绍了,敢兴趣的可以自行了解。
愿一路奔跑不退缩,到目前一直从事.Net的B/S,C/S企业应用研发



