zoukankan      html  css  js  c++  java
  • spider-抓取网页内容(Beautiful soup)

    http://jingyan.baidu.com/article/afd8f4de6197c834e386e96b.html

    http://cuiqingcai.com/1319.html

    Windows下安装Beautifulsoup:

    1.下载压缩包:https://www.crummy.com/software/BeautifulSoup/#Download

    2.将其解压到Python目录下

    3.导航到如下目录,然后运行如下命令:

       setup.py build

       setup.py install

    4.进入Python,导入BS模块,表示安装成功

       from bs4 import BeautifulSoup

    实例:bs抓取天气预报:

    # -*- coding: UTF-8 -*-
    
    import urllib2,sys,json
    from json import *
    from bs4 import BeautifulSoup as bs
    
    reload(sys)
    sys.setdefaultencoding('utf-8')
    
    url='http://www.weather.com.cn/weather/101010100.shtml'
    req = urllib2.Request(url)
    res = urllib2.urlopen(req).read()
    
    soup = bs(res)
    #print soup.prettify()
    
    
    divsw = soup.find_all('div',class_='c7d',id='7d')[0]  #7天的预报内容都在该div下,查询结果为queryset,所以需要使用索引0
    divs_date = divsw.find_all('h1') #find date
    for h in divs_date:
        print h.string
    
    divs_wea = divsw.find_all('p',class_='wea') #find weather
    for p in divs_wea:
        print p.get('title')
    
    divs_tem = divsw.find_all('p',class_='tem') #find weather
    for tem in divs_tem:
        tem_max = tem.find('span').string
        tem_min = tem.find('i').string
        print tem_min,'-',tem_max
    
    
    
            

     结果:

  • 相关阅读:
    mac修改brew源
    分屏工具xpanes
    MySQL自增id不连续问题
    Ubuntu16.04安装zkui
    antlr解析hive语句
    Elasticsearch学习笔记——索引模板
    Elasticsearch6.2.1安装elasticsearch-sq插件
    多用户同时操作一条Mysql记录问题
    Nexus上传npm包
    Nginx请求转发
  • 原文地址:https://www.cnblogs.com/dreamer-fish/p/5291211.html
Copyright © 2011-2022 走看看