1读入待分析的字符串
2.分解提取单词
3.计数字典
4.排除语法型词汇
5.排序
6.输出TOP(20)
lyric=open('lyric.txt','w')
lyric.write('''your butt is mine
I Gonna tell you right
Just show your face
In broad daylight
I'm telling you
On how I feel
Gonna Hurt Your Mind
Don't shoot to kill
Shamone
Shamone
Lay it on me
All right
I'm giving you
On count of three
To show your stuff
Or let it be
I'm telling you
Just watch your mouth
I know your game
What you're about
Well they say the sky's the limit
And to me that's really true
But my friend you have seen nothin'
Just wait till I get through
Because I'm bad,I'm bad
shamone
(Bad,bad,really,really bad)
You know I'm bad,I'm bad
(Bad,bad,really,really bad)
You know it
You know I'm bad,I'm bad
Come on,you know
(Bad,bad,really,really bad)
And the whole world
Has to answer right now
Just to tell you once again
Who's bad
The word is out
You're doin' wrong
Gonna lock you up
Before too long
Your lyin' eyes
Gonna tell you right
So listen up
Don't make a fight
Your talk is cheap
You're not a man
Your throwin' stones
To hide your hands
Well they say the sky's the limit
And to me that's really true
But my friend you have seen nothin'
Just wait till I get through
Because I'm bad,I'm bad
shamone
(Bad,bad,really,really bad)
You know I'm bad,I'm bad
(Bad,bad,really,really bad)
You know it
You know I'm bad,I'm bad
Come on,you know
(Bad,bad,really,really bad)
And the whole world
Has to answer right now
Just to tell you once again
Who's bad
We could change the world tomorrow
This could be a better place
If you don't like what I'm sayin'
Then won't you slap my face
Because I'm bad''')
lyric.close()
comment=open('lyric.txt','r')
bad=comment.read()
comment.close()
bad=bad.lower()
for i in ",.?!()":
bad=bad.replace(i,' ')
bad=bad.replace('
',' ')
words=bad.split(' ')
s=set(words)
delete={"the","a","it","to","on","and"}
for i in delete:
s.remove(i)
dic={}
lis=[]
for i in s:
if(i==" "):
continue
if(i==""):
continue
dic[i]=words.count(i)
lis.append(words.count(i))
lis=list (dic.items())
lis.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
print(lis[i])
运行:
