zoukankan      html  css  js  c++  java
  • BeautifulSoup练习

    html1="""
    <!DOCTYPE html>
    <html lang="en" xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta charset="utf-8" />
    <title>我的第一个网页</title>
    <meta name="generator" content="EverEdit" />
    <meta name="author" content="" />
    <meta name="keywords" content="" />
    <meta name="description" content="" />
    </head>
    <body>
    <div class="rows">
    <a href="http://www.baidu.com/" target="_blank">
    <div class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color1">
    <span class="vfsd_a_title">百度</span>
    </div>
    </a>
    <a href="http://www.google.com/" target="_blank">
    <div class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color3">
    <span class="vfsd_a_title">Google</span>
    </div>
    </a>
    <a href="http://www.oschina.net/" target="_blank">
    <div class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color2">
    <span class="vfsd_a_title">Stack Overflow</span>
    </div>
    </a>
    </div>
    <p class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color2">你好
    <span class="vfsd_a_title">CSDN</span>
    </p>
    <p class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color2">
    <span class="vfsd_a_title">FaceBook</span>
    </p>
    <p class="nmn" id="nmn1">
    <span class="vfsd_a_title">开源中国</span>
    </p>
    </body>
    </html>
    """

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html1,'lxml')

    print(soup.title)

    ####################输出:

    <title>我的第一个网页</title>

    print(soup.title.string)

    ####################输出:

    我的第一个网页

    print(soup.head)

    ####################输出:

    <head>
    <meta charset="utf-8"/>
    <title>我的第一个网页</title>
    <meta content="EverEdit" name="generator"/>
    <meta content="" name="author"/>
    <meta content="" name="keywords"/>
    <meta content="" name="description"/>
    </head>

    for i,child in enumerate(soup.div.children):
      print(i,child)

    ####################输出:

    ['
    ', <a href="http://www.baidu.com/" target="_blank">
    <div class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color1">
    <span class="vfsd_a_title">百度</span>
    </div>
    </a>, '
    ', <a href="http://www.google.com/" target="_blank">
    <div class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color3">
    <span class="vfsd_a_title">Google</span>
    </div>
    </a>, '
    ', <a href="http://www.oschina.net/" target="_blank">
    <div class="col-xs-12 col-sm-6 col-md-4 col-lg-2 vfsd-div vfsd-div-color2">
    <span class="vfsd_a_title">Stack Overflow</span>
    </div>
    </a>, '
    ']

  • 相关阅读:
    处理视频流时可能出现的重复帧问题及解决办法
    shell脚本中cd命令无效
    C++教程之初识编程
    若干排序算法的Python实现方法及原理
    C/C++反三角函数使用注意
    ubuntu下安装pip install mysqlclient 报错 command "python setup.py egg_info" failed with error.....解决方案
    使用scrapy框架的monkey出现monkeypatchwarning: monkey-patching ssl after ssl...的解决办法
    机器学习之利用KNN近邻算法预测数据
    python数据分析的工具环境
    python代码实现经典排序算法
  • 原文地址:https://www.cnblogs.com/herd/p/9570983.html
Copyright © 2011-2022 走看看