zoukankan      html  css  js  c++  java
  • python中BeautifulSoup模块

    BeautifulSoup模块是干嘛的?

    答:通过html标签去快速匹配标签中的内容。效率相对比正则会好的多。效率跟xpath模块应该差不多。

    一:解析器:

    • BeautifulSoup(html,"html.parser")
    • BeautifulSoup(html,'lxml')
    • BeautifulSoup(html,'xml')
    • BeautifulSoup(html,'html5lib')

     假设要匹配a标签里的href属性:

    1 html = "<a href='http://baidu.com/'>this is baidu.com</a>"
    2 bs = BeautifulSoup(html,"lxml")
    3 all_href = bs.find_all('a')
    4 for i in all_href:
    5     print i['href']
     1 #!usr/bin/env python
     2 #encding:utf-8
     3 #by i3ekr
     4 
     5 import requests
     6 from bs4 import BeautifulSoup
     7 
     8 html = """
     9 <!DOCTYPE html>
    10 <html>
    11 <head>
    12     <title>title test demo</title>
    13 </head>
    14 <body>
    15     <h1>this is h1</h1>
    16     <h1>this is h1 two</h1>
    17     <h1>this is h1 stree</h1>
    18     <a href="http://baidu.com">this is a href.</a>
    19 </body>
    20 </html>
    21 """
    22 bs = BeautifulSoup(html, "lxml")
    23 print bs.find_all('h1')
  • 相关阅读:
    Docker build Dockerfile 构建镜像
    Docker 容器启动 查看容器状态
    Docker 获取镜像
    Docker 容器状态查看
    windows 检测进程pid
    bzoj 1083 最小生成树
    bzoj 2039 最小割模型
    bzoj 2749 杂题
    bzoj 2748 DP
    bzoj 3190 维护栈
  • 原文地址:https://www.cnblogs.com/nul1/p/8947034.html
Copyright © 2011-2022 走看看