zoukankan      html  css  js  c++  java
  • 爬虫大作业

    f=open("C:/Users/ZD/PycharmProjects/test/test.txt",'w+',encoding='utf8')
    import jieba
    import requests
    from bs4 import BeautifulSoup
    
    
    def songlist(url):
        res = requests.get(url)
        res.encoding = 'UTF-8'
        soup = BeautifulSoup(res.text, 'html.parser')
        songname=soup.select('.song')
        for i in songname[1:]:
            url=i.select('a')[0].attrs['href']
            songread(url)
    
    
    def songread(url):
        f=open("C:/Users/ZD/PycharmProjects/test/test.txt",'w+',encoding='utf8')
        res = requests.get(url)
        res.encoding = 'UTF-8'
        soup = BeautifulSoup(res.text, 'html.parser')
        song=soup.select('.lrcItem')
        for i in song:
    
            f.write(i.text)
    
    
    
    songlist('http://www.kuwo.cn/geci/a_336/?')
    f=open("C:/Users/ZD/PycharmProjects/test/test.txt",'r',encoding='utf8')
    str=f.read()
    f.close()
    
    wordList=jieba.cut(str)
    wordList=list(jieba.cut(str))
    
    wordDic = {}
    for i in set(wordList):
        wordDic[i] = wordList.count(i)
    
    sort_word = sorted(wordDic.items(), key=lambda d: d[1], reverse=True)
    for i in range(60):
        print(sort_word[i])
    
    
    fo=open("C:/Users/ZD/PycharmProjects/test/test1.txt",'w',encoding='utf8')
    for i in range(60):
        fo.write(sort_word[i][0] +'
    ')
    
    fo.close()
    

      

    制作过程中遇到三个问题,一个是文件用w+打开后写完数据进去读取不出来,后来在读前再打开一次文件解决了

    二是将列表排序后想取出里面的str,解决方式:问刘东

    三是安装wordcloud不成功,改用在线词云生成器==

  • 相关阅读:
    java中源代码和lib库中有包名和类名都相同的类(转)
    Python 入门之基本数据类型
    Python 学习经历分享
    String 与不可变对象
    String 的常用操作
    Java 中的国际化
    接口和工厂设计模式
    抽象类和模板设计模式
    Java中的访问控制权限
    Java 中类的初始化过程
  • 原文地址:https://www.cnblogs.com/zd983886992/p/8964241.html
Copyright © 2011-2022 走看看