zoukankan      html  css  js  c++  java
  • 2020寒假 12

    发现一个问题:

    bs4 FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

    解决方法:将"lxml"改成"html.parser"

    soup = BeautifulSoup(content, "lxml")改成

    soup = BeautifulSoup(content, "html.parser")

    今天学习了关于python中beautiful soup的一些内容

    了解到soup.find_all("tr",attrs={"class":"alt"})中的树形结构以及contents的使用

    运行了简单的爬取中国大学排名的代码,结合网页源代码进行python源码分析理解

     1 import requests
     2 from bs4 import BeautifulSoup
     3 
     4 headers = {
     5     "User-Agent": "Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14"
     6 }
     7 response = requests.get("http://www.zuihaodaxue.com/zuihaodaxuepaiming2019.html", headers=headers)
     8 response.encoding = "utg-8"
     9 if response.status_code == 200:
    10     soup = BeautifulSoup(response.text,"html.parser")
    11     trTags = soup.find_all("tr",attrs={"class":"alt"})
    12     for trTag in trTags:
    13         id = trTag.contents[0].string
    14         name = trTag.contents[1].string
    15         addr = trTag.contents[2].string
    16         sco = trTag.contents[3].string
    17         sco1 = trTag.contents[4].string
    18         print(f"{id} {name} {addr} {sco} {sco1}")
    19     #代码来源于网络,比较简单

    代码运行结果:

  • 相关阅读:
    MySQL decimal unsigned 更新负数不报错却为0
    centos 安装jdk
    CentOS7安装docker
    Cron 时间元素
    PHPStorm
    日志习惯
    HTTP幂等性
    navicat for mysql 10.1.7注册码
    localStorage、sessionStorages 使用
    FreePascal
  • 原文地址:https://www.cnblogs.com/lixv2018/p/12288915.html
Copyright © 2011-2022 走看看