zoukankan      html  css  js  c++  java
  • 爬虫学习——Beautiful Soup实例

     如果你还是不知道该如何使用,那就多去实践,多看大佬的代码,先模仿后创造。

     1 #coding:utf-8   
     2 
     3 from bs4 import BeautifulSoup
     4 import re
     5 html_doc = """
     6 <html><head><title>The Dormouse's story</title></head>
     7 <body>
     8 <p class="title"><b>The Dormouse's story</b></p>
     9 
    10 <p class="story">Once upon a time there were three little sisters; and their names were
    11 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    12 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    13 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    14 and they lived at the bottom of a well.</p>
    15 
    16 <p class="story">...</p>
    17 """
    18 
    19 soup = BeautifulSoup(html_doc,"html.parser",from_encoding="utf-8")
    20 
    21 print "获取所有链接:"
    22 links = soup.find_all('a')
    23 
    24 for link in links:
    25    print link.name,link['href'],link.get_text()
    26 
    27 
    28 
    29 print "获取Lacie的链接:"
    30 link_node = soup.find('a', href='http://example.com/lacie')
    31 print link_node.name, link_node['href'], link_node.get_text()
    32 
    33 print "获取Lacie的链接:"
    34 link_node = soup.find('a',href=re.compile(r"ill"))
    35 print link_node.name,link_node['href'],link_node.get_text()
    36 
    37 print "获取p段落字段"
    38 p_node = soup.find('p',class_="title")
    39 print p_node.name,p_node.get_text()
  • 相关阅读:
    济南空中课堂视频下载辅助脚本
    npm 修改仓库源
    Java后端实现登陆的方式
    java 新词汇
    数据库系统,设计、实现与管理(基础篇)阅读笔记
    java 面试01
    js rem 适配多端
    了解Java
    linux 查看内存使用情况
    linux 日志查看
  • 原文地址:https://www.cnblogs.com/ryuuku/p/7135921.html
Copyright © 2011-2022 走看看