zoukankan      html  css  js  c++  java
  • 网络爬虫简单入门--数据抓取-数据解析-数据显示-数据入库-B

    代码1:

    a=[3.45,4.45,5]
    b=[5,4]
    c=["aa",456,True]
    myList=[]
    myList.append(a)
    myList.append(b)
    myList.append(c)
    print(myList)

    代码2:

    #coding=utf-8
    list=[]
    for i in range(1,101):
        list.append(i)
    
    # print(list)
    
    tempList=[]
    newList=[]
    
    while True:
        num=0
        for temp in list:
            tempList.append(temp)
            num+=1
            if num==3:
                newList.append(tempList)
                tempList=[]
                num=0
                continue
        if temp==100:
            newList.append(tempList)
            break
    
    print(newList)

    代码3:

    import requests
    from bs4 import BeautifulSoup
    allUniv = []
    def getHTMLText(url):
        try:
            r = requests.get(url, timeout=30)
            r.raise_for_status()
            r.encoding = 'utf-8'
            return r.text
        except:
            return ""
    def fillUnivList(soup):
        data = soup.find_all('tr')
        for tr in data:
            ltd = tr.find_all('td')
            if len(ltd)==0:
                continue
            singleUniv = []
            for td in ltd:
                singleUniv.append(td.string)
            allUniv.append(singleUniv)
    def printUnivList(num):
        print("{:^4}{:^10}{:^5}{:^8}{:^10}".format("排名","学校名称","省市","总分","培养规模"))
        for i in range(num):
            u=allUniv[i]
            print("{:^4}{:^10}{:^5}{:^8}{:^10}".format(u[0],u[1],u[2],u[3],u[6]))
    def main():
        url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html'
        html = getHTMLText(url)
        soup = BeautifulSoup(html, "html.parser")
        fillUnivList(soup)
        printUnivList(10)
    main()

    课后作业:

    1.复制上述代码,在Python环境下运行。

    2.读懂上述代码。

  • 相关阅读:
    转载的:关于matlab中princomp的使用说明
    STL容器Vector
    Ubuntu20.04下创建Pycharm桌面图标
    c++和c中const的区别
    内存性能分析\垃圾回收 文章
    jq使用教程
    iOS15适配 UITableView下移22px
    linux 内核头文件(转)
    bjfu1143 小蝌蚪安家 解题报告
    hdu 1874 畅通工程续 flody
  • 原文地址:https://www.cnblogs.com/exesoft/p/12988105.html
Copyright © 2011-2022 走看看