zoukankan html css js c++ java

python BeautifulSoup基本用法

#coding:utf-8
import os
from bs4 import BeautifulSoup
#jsp 路径
folderPath = "E:/whm/google/src_jsp"

for dirPath,dirNames,fileNames in os.walk(folderPath):
    for fileName in fileNames:
        if fileName.endswith(".jsp"):
            soup=BeautifulSoup(open(os.path.join(dirPath,fileName)),"html.parser")
            if(soup.header is not None):
                soup.header.extract()
            #属性选择器。。。只能选择出第一个符合规则的元素
            if(soup.find(attrs={'role':'banner'}) is not None):
                soup.find(attrs={'role':'banner'}).extract()
            if(soup.find(attrs={'class':"col-xs-3"}) is not None):
                soup.find(attrs={'class':"col-xs-3"}).extract()
            with open(os.path.join(dirPath,fileName),"w+") as file:
                #pretify()方法返回一个美化过的html 字符串 encode('utf-8')指定编码--
                file.write(soup.prettify(formatter=None).encode('utf-8'))

处理jsp页面会出现bug。。。所以。。不要使用BeautifulSoup处理 jsp和php等脚本页面。。。需要用正则来写。。。这是我摸索半天得来的结论。。。。。

查看全文

相关阅读:
ES基本原理
 docker技术基础
 docker简介
 IaaS,PaaS和SaaS
python的type和object
django：一个RESTfull的接口从wsgi到函数的历程
 python的list内存分配算法
 10个我最喜欢问程序员的面试问题
 月薪三万的面试题
 深圳有趣网络笔试题面试题

原文地址：https://www.cnblogs.com/whm-blog/p/7121895.html