1读入待分析的字符串
2.分解提取单词
3.计数字典
4.排除语法型词汇
5.排序
6.输出TOP(20)
lyric=open('lyric.txt','w') lyric.write('''your butt is mine I Gonna tell you right Just show your face In broad daylight I'm telling you On how I feel Gonna Hurt Your Mind Don't shoot to kill Shamone Shamone Lay it on me All right I'm giving you On count of three To show your stuff Or let it be I'm telling you Just watch your mouth I know your game What you're about Well they say the sky's the limit And to me that's really true But my friend you have seen nothin' Just wait till I get through Because I'm bad,I'm bad shamone (Bad,bad,really,really bad) You know I'm bad,I'm bad (Bad,bad,really,really bad) You know it You know I'm bad,I'm bad Come on,you know (Bad,bad,really,really bad) And the whole world Has to answer right now Just to tell you once again Who's bad The word is out You're doin' wrong Gonna lock you up Before too long Your lyin' eyes Gonna tell you right So listen up Don't make a fight Your talk is cheap You're not a man Your throwin' stones To hide your hands Well they say the sky's the limit And to me that's really true But my friend you have seen nothin' Just wait till I get through Because I'm bad,I'm bad shamone (Bad,bad,really,really bad) You know I'm bad,I'm bad (Bad,bad,really,really bad) You know it You know I'm bad,I'm bad Come on,you know (Bad,bad,really,really bad) And the whole world Has to answer right now Just to tell you once again Who's bad We could change the world tomorrow This could be a better place If you don't like what I'm sayin' Then won't you slap my face Because I'm bad''') lyric.close() comment=open('lyric.txt','r') bad=comment.read() comment.close() bad=bad.lower() for i in ",.?!()": bad=bad.replace(i,' ') bad=bad.replace(' ',' ') words=bad.split(' ') s=set(words) delete={"the","a","it","to","on","and"} for i in delete: s.remove(i) dic={} lis=[] for i in s: if(i==" "): continue if(i==""): continue dic[i]=words.count(i) lis.append(words.count(i)) lis=list (dic.items()) lis.sort(key=lambda x:x[1],reverse=True) for i in range(20): print(lis[i])
运行: