zoukankan      html  css  js  c++  java
  • Python SGMLParser 的1个BUG??

    首先说一下,我用的是python 2.7,刚好在学Python,今天想去爬点图片当壁纸,但是当我用 SGMLParser 做 <img> 标签解析的时候,发现我想要的那部分根本没获取到,我尝试用 lxml 修复网页,还是解析不出..但是当我把此部分字段单独提出来时,我却可以将此部分标签解析出来,实在无法解决这个问题...先将问题放在这里,用正则表达式去匹配好了..如果有遇到过此问题的前辈请务必告诉我..我的邮箱是 781512880@qq.com

    这是源网站:http://mcyacg.com/m60948/

    <div class="quote"><blockquote><font size="5"><font color="Pink">P站引言:</font></font>正好可以放在手心里的红色果实——“苹果”。红色果实切开后看起来像心形的苹果,在白雪公主、亚当与夏娃等故事中都是作为诱惑的象征登场。艳丽的红色外皮包裹着香甜多汁的清脆果实,或许有着一种不可思议的魅力吧。<br/>
    今天,就为大家送上描绘了“苹果”的插画作品特辑。敬请欣赏这些仿佛能听见咬下新鲜苹果时的清脆声音的插画作品。</blockquote></div><br/>
    <a href="http://pan.baidu.com/s/1mhJ4ti4" target="_blank"><font size="5">下载</font></a><br/>
    <img id="aimg_IPKD8" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i4.qlshw.org/08859a09d120090cfff30152010130c7.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_Sv2k2" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i4.qlshw.org/db3a060b649a422a701dd47982f9cbe5.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_Ob8zD" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i4.qlshw.org/8fd5b9b5f4706b17e71c00939c75f648.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_MWOuz" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i2.qlshw.org/7b5f4a94fff33ea1a7cac45131f2ba41.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_nG9jr" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i3.qlshw.org/4c0a17365342ef700c68c4e4caada0e0.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_J790D" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i2.qlshw.org/a1f2a3486ce679f007abea46782a33b7.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_v6mTz" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i4.qlshw.org/8d37e5d40f34c03180080135e8757bc8.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_wzFQq" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i2.qlshw.org/7feed5d205b6811d5d1366dd495a0760.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_xWlS5" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i2.qlshw.org/413f8d116e31174451032abfa72c1246.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_O8T03" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i4.qlshw.org/f654b77544edfeee9cda4d069e704c90.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_PfGhH" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i2.qlshw.org/e6fb8b6eafd5ef5dea2a13a284ba8309.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_tZEBu" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i3.qlshw.org/46d344876774e7dcb059ef84e2fc70f7.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_PnP6y" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i1.qlshw.org/9c6ab03cffb678a0945dcb0da127ea63.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_H01fi" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i1.qlshw.org/d0bf9d03f427a730b40a29bfebc9697a.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_j1pqX" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i3.qlshw.org/36074139da9039d1d4c0d1042f6b1b8c.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_xaHP0" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i3.qlshw.org/a40f503868cdd657531cbea34adf55e6.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_wE44O" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i1.qlshw.org/0a5d24a51f5c4ad0041d8feec5b5fe9a.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_T50cd" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i3.qlshw.org/b9f666b571221894bd4d922d369fce5d.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_C4o7y" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i4.qlshw.org/114b3eff9da458cfbfb52d08160fd30a.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <img id="aimg_m2c3s" onclick="zoom(this, this.src, 0, 0, 0)" class="zoom" file="http://i3.qlshw.org/ba3265f867650ef35891f6cb09a7a196.jpg" onmouseover="img_onmouseoverfunc(this)" lazyloadthumb="1" border="0" alt=""/><br/>
    <br/>

    这部分是我想要提取的元素.

    from sgmllib import SGMLParser
    class LableParse(SGMLParser):
        def reset(self):
            SGMLParser.reset(self)        
            self.level = 0
            self.flag = False
            self.picturesrc=[]
        def start_div(self,attrs):
            if self.flag == True: #遇到子层 level+1
                self.level+=1
            for k,v in attrs:
                if k=='class' and v=='pct':
                    self.flag = True
                    self.level+=1 #自己加一层
                    
        def end_div(self):
            if self.level == 0:
                self.flag = False
            if self.flag == True: #退出DIV子层的时候level-1
                self.level-=1
        def start_img(self,attrs):
            #if self.flag == True:
                for k,v in attrs:
                    print '{%s : %s}'%(k,v)
                    
    if __name__ == '__main__':
        lp = LableParse()
        lp.feed(open('source.txt').read())

    这部分是我继承自 SGMLParser 的一个类..

  • 相关阅读:
    网站微信扫码登陆总结以及在小程序登陆两者关联和关系,vue以及uniapp
    微信扫码登陆在chrome浏览器失败,浏览器禁止重定向
    element-ui多个表单如何同时验证
    vscode中react代码提示插件
    echarts主题全局颜色定义、自定义折线颜色--彩色折线图echarts
    vue本地储存加密
    Echarts多条折线图 y轴数值与实际值不符解决方法
    vue中swiper@5.3.6使用,
    解决 swiper设置loop为true时,echarts图表不显示
    vue+nginx配置,以及nginx配置跨域
  • 原文地址:https://www.cnblogs.com/liyinggang/p/6106951.html
Copyright © 2011-2022 走看看