zoukankan      html  css  js  c++  java
  • 使用python转换markdown to html

    起因

    有很多编辑器可以直接将markdown转换成html,为什么还要自己写呢?因为我想写完markdown之后,即可以保存在笔记软件中(比如有道),又可以放到github进行版本管理,还可以发布到博客(比如博客园)。这些如果都操作一遍,是很繁琐的,所以必须交给脚本去做。

    原材料

    • markdown2 or mistune
    • pygments

    操作原理

    • 首先,我需要一个markdown的词法解析器,然后我需要html转换器。这个可以由markdown2或者mistune来完成。
    • 然后,我的笔记中有较多的代码,我需要代码高亮。这首先需要将markdown中的代码块提取出来,然后判断是哪种语言,然后进行着色。这部分可以由pyments完成

    代码

    使用mistune(源码很有学习价值)。需要自己引入pygments模块渲染代码块,官网有参考例子。

    import mistune
    import sys
    import codecs
    from pygments import cnblogs_code 
    from pygments.lexers import get_lexer_by_name
    from pygments.formatters import html
    
    
    class HighlightRenderer(mistune.Renderer):
        def block_code(self, code, lang):
            if not lang:
                return '
    <pre><code>%s</code></pre>
    ' % 
                    mistune.escape(code)
            lexer = get_lexer_by_name(lang, stripall=True)
            formatter = html.HtmlFormatter()
            return cnblogs_code (code, lexer, formatter)
    
    def main(argv):
        name = argv[0]
    
        input_file = codecs.open(name, mode='r', encoding='utf-8')
    
        text = input_file.read()
        renderer = HighlightRenderer()
        markdown = mistune.Markdown(renderer=renderer)
        html = markdown(text)
    
        html_name = '%s.html' % (name[:-3])
        output_file = codecs.open(
            html_name, 'w', encoding='utf-8', errors='xmlcharrefreplace')
    
        output_file.write(html)
    
    if __name__ == "__main__":
        main(sys.argv[1:])
    

    上面代码还不能使代码着色,因为没有指定css,还需要在生成的html头中加入css,不同的css文件可以在http://richleland.github.io/pygments-css/找到。

    <style type = "text/css">
    .cnblogs_code  .hll { background-color: #ffffcc }
    .cnblogs_code  .c { color: #60a0b0; font-style: italic } /* Comment */
    .cnblogs_code  .err { border: 1px solid #FF0000 } /* Error */
    .cnblogs_code  .k { color: #007020; font-weight: bold } /* Keyword */
    .cnblogs_code  .o { color: #666666 } /* Operator */
    .cnblogs_code  .cm { color: #60a0b0; font-style: italic } /* Comment.Multiline */
    .cnblogs_code  .cp { color: #007020 } /* Comment.Preproc */
    .cnblogs_code  .c1 { color: #60a0b0; font-style: italic } /* Comment.Single */
    .cnblogs_code  .cs { color: #60a0b0; background-color: #fff0f0 } /* Comment.Special */
    .cnblogs_code  .gd { color: #A00000 } /* Generic.Deleted */
    .cnblogs_code  .ge { font-style: italic } /* Generic.Emph */
    .cnblogs_code  .gr { color: #FF0000 } /* Generic.Error */
    .cnblogs_code  .gh { color: #000080; font-weight: bold } /* Generic.Heading */
    .cnblogs_code  .gi { color: #00A000 } /* Generic.Inserted */
    .cnblogs_code  .go { color: #808080 } /* Generic.Output */
    .cnblogs_code  .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
    .cnblogs_code  .gs { font-weight: bold } /* Generic.Strong */
    .cnblogs_code  .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
    .cnblogs_code  .gt { color: #0040D0 } /* Generic.Traceback */
    .cnblogs_code  .kc { color: #007020; font-weight: bold } /* Keyword.Constant */
    .cnblogs_code  .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */
    .cnblogs_code  .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */
    .cnblogs_code  .kp { color: #007020 } /* Keyword.Pseudo */
    .cnblogs_code  .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */
    .cnblogs_code  .kt { color: #902000 } /* Keyword.Type */
    .cnblogs_code  .m { color: #40a070 } /* Literal.Number */
    .cnblogs_code  .s { color: #4070a0 } /* Literal.String */
    .cnblogs_code  .na { color: #4070a0 } /* Name.Attribute */
    .cnblogs_code  .nb { color: #007020 } /* Name.Builtin */
    .cnblogs_code  .nc { color: #0e84b5; font-weight: bold } /* Name.Class */
    .cnblogs_code  .no { color: #60add5 } /* Name.Constant */
    .cnblogs_code  .nd { color: #555555; font-weight: bold } /* Name.Decorator */
    .cnblogs_code  .ni { color: #d55537; font-weight: bold } /* Name.Entity */
    .cnblogs_code  .ne { color: #007020 } /* Name.Exception */
    .cnblogs_code  .nf { color: #06287e } /* Name.Function */
    .cnblogs_code  .nl { color: #002070; font-weight: bold } /* Name.Label */
    .cnblogs_code  .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
    .cnblogs_code  .nt { color: #062873; font-weight: bold } /* Name.Tag */
    .cnblogs_code  .nv { color: #bb60d5 } /* Name.Variable */
    .cnblogs_code  .ow { color: #007020; font-weight: bold } /* Operator.Word */
    .cnblogs_code  .w { color: #bbbbbb } /* Text.Whitespace */
    .cnblogs_code  .mf { color: #40a070 } /* Literal.Number.Float */
    .cnblogs_code  .mh { color: #40a070 } /* Literal.Number.Hex */
    .cnblogs_code  .mi { color: #40a070 } /* Literal.Number.Integer */
    .cnblogs_code  .mo { color: #40a070 } /* Literal.Number.Oct */
    .cnblogs_code  .sb { color: #4070a0 } /* Literal.String.Backtick */
    .cnblogs_code  .sc { color: #4070a0 } /* Literal.String.Char */
    .cnblogs_code  .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */
    .cnblogs_code  .s2 { color: #4070a0 } /* Literal.String.Double */
    .cnblogs_code  .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */
    .cnblogs_code  .sh { color: #4070a0 } /* Literal.String.Heredoc */
    .cnblogs_code  .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */
    .cnblogs_code  .sx { color: #c65d09 } /* Literal.String.Other */
    .cnblogs_code  .sr { color: #235388 } /* Literal.String.Regex */
    .cnblogs_code  .s1 { color: #4070a0 } /* Literal.String.Single */
    .cnblogs_code  .ss { color: #517918 } /* Literal.String.Symbol */
    .cnblogs_code  .bp { color: #007020 } /* Name.Builtin.Pseudo */
    .cnblogs_code  .vc { color: #bb60d5 } /* Name.Variable.Class */
    .cnblogs_code  .vg { color: #bb60d5 } /* Name.Variable.Global */
    .cnblogs_code  .vi { color: #bb60d5 } /* Name.Variable.Instance */
    .cnblogs_code  .il { color: #40a070 } /* Literal.Number.Integer.Long */
    </style>
    

    所以完整的代码应该为:

    import mistune
    import sys
    import codecs
    from pygments import cnblogs_code 
    from pygments.lexers import get_lexer_by_name
    from pygments.formatters import html
    
    
    class HighlightRenderer(mistune.Renderer):
        def block_code(self, code, lang):
            if not lang:
                return '
    <pre><code>%s</code></pre>
    ' % 
                    mistune.escape(code)
            lexer = get_lexer_by_name(lang, stripall=True)
            formatter = html.HtmlFormatter()
            return cnblogs_code (code, lexer, formatter)
    
    def main(argv):
        md_name = argv[0]
    
        with codecs.open(md_name, mode='r', encoding='utf-8') as mdfile:
            with codecs.open("friendly.css",mode = 'r',encoding = 'utf-8') as cssfile:
                md_text = mdfile.read()
                css_text = cssfile.read()
                renderer = HighlightRenderer()
                markdown = mistune.Markdown(renderer=renderer)
                html_text = markdown(md_text)
    
                html_name = '%s.html' % (md_name[:-3])
                with codecs.open(html_name, 'w', encoding='utf-8', errors='xmlcharrefreplace') as output_file:
                    output_file.write(css_text + html_text)
    
    if __name__ == "__main__":
        if len(sys.argv) == 2:
            main(sys.argv[1:])
        else:
            print("Error:please specify markdown file path")
    

    friendly.css文件中存放之前的css文件。

    同样使用markdown2的代码如下:

    import markdown2
    import codecs
    import sys
    
    
    def main(argv):
        md_name = argv[0]
    
        with codecs.open(md_name, mode='r', encoding='utf-8') as mdfile:
            with codecs.open("friendly.css", mode='r', encoding='utf-8') as cssfile:
                md_text = mdfile.read()
                css_text = cssfile.read()
    
                extras = ['code-friendly', 'fenced-code-blocks', 'footnotes']
                html_text = markdown2.markdown(md_text, extras=extras)
    
                html_name = '%s.html' % (md_name[:-3])
                with codecs.open(html_name, 'w', encoding='utf-8', errors='xmlcharrefreplace') as output_file:
                    output_file.write(css_text + html_text)
    
    if __name__ == "__main__":
        if len(sys.argv) == 2:
            main(sys.argv[1:])
        else:
            print("Error:please specify markdown file path")
  • 相关阅读:
    GDUFE ACM-1087
    背包九讲
    OJ4TH|Inverse number:Reborn
    OJ4TH|Let's play a game
    GG第四次作业
    OpenCV(3)其他常用数据类型
    OpenCV学习(2)读取视频和摄像头
    OpenCV(1)读写图像
    GG第三次作业
    GG第二次作业
  • 原文地址:https://www.cnblogs.com/WeyneChen/p/6670592.html
Copyright © 2011-2022 走看看