zoukankan      html  css  js  c++  java
  • BeautifulSoup 去除指定的html样式属性

    只想保留table的"colspan", "rowspan" 两个属性值,其余的属性都去掉

    最开始想到的是正则匹配,发现太费劲

    后面发现BeautifulSoup可以解决

    # bs4 去除特定属性
    def remove_css_tags():
        html_str = '''<table cellspacing="0" cellpadding="0" width="90%" align="center" style="border:1px solid #000000; text-align:center; border-collapse:collapse;"><tbody><tr style="background-color: rgb(217, 217, 217);" class="firstRow"><td width="12" height="1" style="border:1px solid #000000;"><p>产品</p></td><td width="29" colspan="2" height="1" style="border:1px solid #000000;"><p><strong><span style="color: black;">价格类型</span></strong></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><strong><span style="color: black;">价格</span></strong></p></td><td width="16" height="1" style="border:1px solid #000000;"><p><strong><span style="color: black;">涨跌</span></strong></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><strong><span style="color: black;">单位</span></strong></p></td></tr><tr style=";height:35px"><td width="12" rowspan="15" height="35" style="border:1px solid #000000;"><p><strong>液氯</strong></p></td><td width="8" rowspan="8" height="35" style="border:1px solid #000000;"><p><span style="color: black;">厂家</span></p></td><td width="20" height="35" style="border:1px solid #000000;"><p><span style="color: black;">茌平信发</span></p></td><td width="20" height="35" style="border:1px solid #000000;"><p><span style="color: black;">2000</span></p></td><td width="16" valign="top" height="35" style="border:1px solid #000000;"><p><span style="color: red;">+500</span></p></td><td width="21" height="35" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:8px"><td width="20" height="8" style="border:1px solid #000000;"><p><span style="color: black;">东营华泰</span></p></td><td width="20" height="8" style="border:1px solid #000000;"><p><span style="color: black;">2000</span></p></td><td width="16" valign="top" height="8" style="border:1px solid #000000;"><p><span style="color: red;">+500</span></p></td><td width="21" height="8" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">河北冀衡</span></p></td><td width="20" valign="top" height="1" style="border:1px solid #000000;"><p><span style="color: black;">2100</span></p></td><td width="16" valign="top" height="1" style="border:1px solid #000000;"><p><span style="color: red;">+500</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">沧州聚隆</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">1800</span></p></td><td width="16" valign="top" height="1" style="border:1px solid #000000;"><p><span style="color: red;">+500</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:8px"><td width="20" height="8" style="border:1px solid #000000;"><p><span style="color: black;">江苏新浦</span></p></td><td width="20" height="8" style="border:1px solid #000000;"><p><span style="color: black;">3300</span></p></td><td width="16" height="8" style="border:1px solid #000000;"><p><span style="color: black;">0</span></p></td><td width="21" height="8" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:35px"><td width="20" height="35" style="border:1px solid #000000;"><p><span style="color: black;">江苏富强</span></p></td><td width="20" height="35" style="border:1px solid #000000;"><p><span style="color: black;">封盘</span></p></td><td width="16" height="35" style="border:1px solid #000000;"><p><span style="color: rgb(34, 42, 53);">0</span></p></td><td width="21" height="35" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:8px"><td width="20" height="8" style="border:1px solid #000000;"><p><span style="color: black;">内蒙吉兰泰</span></p></td><td width="20" height="8" style="border:1px solid #000000;"><p><span style="color: black;">2800</span></p></td><td width="16" height="8" style="border:1px solid #000000;"><p><span style="color: rgb(34, 42, 53);">0</span></p></td><td width="21" height="8" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:36px"><td width="20" height="36" style="border:1px solid #000000;"><p><span style="color: black;">方大锦化</span></p></td><td width="20" height="36" style="border:1px solid #000000;"><p><span style="color: black;">2000</span></p></td><td width="16" valign="top" height="36" style="border:1px solid #000000;"><p><span style="color: red;">+500</span></p></td><td width="21" height="36" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="8" rowspan="6" height="1" style="border:1px solid #000000;"><p><span style="color: black;">市   场</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">山东地区</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">1800-2000</span></p></td><td width="16" valign="top" height="1" style="border:1px solid #000000;"><p><span style="color: red;">+300/500</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">河北地区</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">1300/2100</span></p></td><td width="16" height="1" style="border:1px solid #000000;"><p><span style="color: red;">+0/500</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">江苏地区</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">2200-3100</span></p></td><td width="16" height="1" style="border:1px solid #000000;"><p><span style="color: black;">0</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">河南地区</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">1500</span></p></td><td width="16" valign="top" height="1" style="border:1px solid #000000;"><p><span style="color: black;">0</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">辽宁地区</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">1800/2000</span></p></td><td width="16" height="1" style="border:1px solid #000000;"><p><span style="color: red;">+500</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:1px"><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">内蒙古地区</span></p></td><td width="20" height="1" style="border:1px solid #000000;"><p><span style="color: black;">2800</span></p></td><td width="16" height="1" style="border:1px solid #000000;"><p><span style="color: rgb(34, 42, 53);">0</span></p></td><td width="21" height="1" style="border:1px solid #000000;"><p><span style="color: black;">元/吨</span></p></td></tr><tr style=";height:188px"><td width="8" height="188" style="border:1px solid #000000;"><p><span style="color: black;">市场简述及后市预测</span></p></td><td width="78" colspan="4" height="188" style="border:1px solid #000000;"><p><span style="color: black;">液氯:今日国内液氯市场低位反弹,市场呈上行趋势。华泰装置检修、大地装置降负荷,鲁中东部供应端减少,但鲁西区域企业复产,加之配套下游停车,市场处于博弈阶段,但企业存看涨心态,鉴于此,预计明日不排除有继续上行可能。</span></p></td></tr></tbody></table>'''
        soup = BeautifulSoup(html_str, "html.parser")
        remove_html = remove_attrs(soup, whitelist=["colspan", "rowspan"])
        # for tag in soup():
        #     for attr in tag.attrs:
        #         print(attr)
        #         if attr not in ["colspan", "rowspan"]:
        #             del tag[attr]
    
        # for attribute  in ["colspan", "rowspan"]:
    
        #     del tag[attribute]
        print(remove_html)
    

      

    #去除指定的css属性
    def remove_attrs(soup, whitelist=["colspan", "rowspan"]):
        for tag in soup.findAll(True):
            for attr in [attr for attr in tag.attrs if attr not in whitelist]:
                del tag[attr]
        return soup
    

      

  • 相关阅读:
    [codevs 1243][网络提速(最短路分层思想)
    [codevs 1183][泥泞的道路(二分+spfa)
    [codevs 2488]绿豆蛙的归宿(拓扑排序)
    [codevs 1961]躲避大龙(dfs)
    4、userCF和itemCF对比,冷启动
    query简洁弹出层代码
    css 积累1
    localStorage,sessionStorage
    tr th td
    (转存)面向切面编程(AOP)的理解
  • 原文地址:https://www.cnblogs.com/fly-kaka/p/15494570.html
Copyright © 2011-2022 走看看