zoukankan      html  css  js  c++  java
  • 课堂练习-爬网(2)-爬网代码

    爬网演练代码1:

    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.cnblogs.com/exesoft/p/13184331.html'
    r = requests.get(url, timeout=30)
    r.encoding = 'utf-8'
    soup = BeautifulSoup(r.text, "html.parser")  
    trs = soup.select('.exesoft-table tr')
    print(type(trs))
    print(trs)
    print("------------------------")
    for tr in trs:
        print(tr)
        print("---------------")
        tds=tr.find_all('td')
        print(tds)
        print("---------")
        for td in tds:
            print(td.string)
            print("----")

    爬网演练代码2:上述代码的改良版

    import requests
    from bs4 import BeautifulSoup
    allStudents = []
    
    def getHTMLText(url):
        try:
            r = requests.get(url, timeout=30)
            r.raise_for_status()
            r.encoding = 'utf-8'
            return r.text
        except:
            return ""
    
    def fillStudentsList(soup):
        data = soup.select('.exesoft-table tr')
        for tr in data:
            ltd = tr.find_all('td')
            if len(ltd)==0:
                continue
            singleStudent = []
            for td in ltd:
                singleStudent.append(td.string)
            allStudents.append(singleStudent)
    
    def printStudentsList():
        print(allStudents)
        print("{}  {}    {}".format("编号","姓名","分数"))
        for i in range(5):
            u=allStudents[i]
            print("{}   {}    {}".format(u[0],u[1],u[2]))
    def main():
        url = 'https://www.cnblogs.com/exesoft/p/13184331.html'
        html = getHTMLText(url)
        soup = BeautifulSoup(html, "html.parser")
        fillStudentsList(soup)
        printStudentsList()
    main()

     代码运行效果:

    小结:

    爬网可以看成一个"剥蒜"过程.最终的目标是要获取一个带有少量皮的蒜,还是一颗纯净的蒜,这个由业务需求决定。由于爬网得来的数据结构列表居多,所以可以通过循环结构层层剥离外面的包装(即:Html标签),得到相应的数据。

  • 相关阅读:
    弹丸类以及魂类的构想
    LaunchCharacter
    如何让Ue4画面产生振动效果
    解决Ue4C++使用UMG之类的模块时出现的拼写错误
    我认为我可以去尝试做一下Maya Ue4导出插件
    Wiki上的Ue4文件结构以及命名规范
    如何在修改了默认值之后跟新
    Ue4 BatteryCollector 教程笔记
    Ue4的GitHUB版本版本管理探索
    FString的相关文档,另外还有4种LOG的方法
  • 原文地址:https://www.cnblogs.com/exesoft/p/13185884.html
Copyright © 2011-2022 走看看