难点是识别字符串中的//, /*和*/。后来觉得只要在匹配注释的时候越过字符串,不去管它就好了。
匹配C++中的字符串的正则表达式是"([^\*]|\.)*?",意思是引号中字符不能有和*,但是可以有.这种情况,这样就避开了类似"abc"这种字符串,同时也包含了"abc"","abc "这些情形。
代码如下:
#-*- coding:gbk -*- import re def ReplaceComment(matchobj): if not matchobj: return matchstr = matchobj.group(0) if matchstr.startswith('"') and matchstr.endswith('"'): return matchstr else: return '' def RemoveComment(inputfileName, outputfileName): codeString = "" with open(inputfileName, "rt") as inputfile: codeString = inputfile.read() singleLineCommentExp = r'//[^ ]*' multiLinecommentExp = r'/*.*?*/' literalStringExp = r'"([^\"]|\.)*?"' #. should match newline, for scenario like multiline literal string patternExp = literalStringExp + '|' + singleLineCommentExp + '|' + multiLinecommentExp codeString = re.sub(patternExp, ReplaceComment, codeString, 0, re.MULTILINE|re.DOTALL) with open(outputfileName, "wt") as outputfile: outputfile.write(codeString)
如果发现BUG,欢迎指正