zoukankan      html  css  js  c++  java
  • python BeautifulSoup html解析

    * BeautifulSoup 的.find(), .findAll() 函数原型

    findAll(tag, attributes, recursive, text, limit, keywords)
    find(tag, attributes, recursive, text, keywords)
    

      

    * 取得 span.green

    bsObj.findAll("span", {"class":"green"})

    #-*- coding: UTF-8 -*-
    #!/usr/local/bin/python
    from urllib.request import urlopen
    from urllib.request import HTTPError, URLError
    from bs4 import BeautifulSoup
    
    def getBsObj(url):
        try:
            html = urlopen(url, None, 3)
        except(HTTPError, URLError) as e:
            print(e)
            return None
        try:
            bsObj = BeautifulSoup(html.read(), "html.parser")
        except AttributeError as e:
            return None
        return bsObj
    
    bsObj = getBsObj("http://www.pythonscraping.com/pages/warandpeace.html")
    nameList = bsObj.findAll("span", {"class":"green"})
    for name in nameList:
        print(name.get_text())
    

      

    * 取得 h1,h2,h3,h4,h5,h6

    bsObj.findAll({"h1","h2","h3","h4","h5","h6"});
    

      

    // javascript 生成引号 包裹每个元素的字符串

    function quote(s) {
        return """ + s.split(",").join("","") + """;
    }
    var s = "h1,h2,h3,h4,h5,h6"
    console.log(quote(s))
    

      

    * 取得 span.green, span.red

    bsObj.findAll("span", {"class":{"green", "red"}})

    * 取得网页中包含"the prince"内容的标签数量

    nameList = bsObj.findAll(text="the prince")
    print(len(nameList))

    * 找到#text  id="text"

    allText = bsObj.find(id="text")
    print(allText.get_text())

    * 找到div#text

    allText = bsObj.find("div", {"id":"text"})

    * 找到div#text > span.red:first-child

    red = bsObj.find("div", {"id":"text"}).find("span", {"class":"red"}, False)
    print(red.get_text())
    

      

  • 相关阅读:
    算法:拓扑排序
    【欧拉计划2】Even Fibonacci numbers
    机房收费系统之模版方法使用
    VC运行时库
    数据库学习(6)——基本查询操作
    Attribute与Property的区别
    记C++类成员访问权限符二三事
    大年初五去颐和园
    2013年第6周六农历除夕下午
    大年初四晚上睡前
  • 原文地址:https://www.cnblogs.com/mingzhanghui/p/9424791.html
Copyright © 2011-2022 走看看