python之路_kindEditor编辑器及beautifulsoup模块

zoukankan html css js c++ java

python之路_kindEditor编辑器及beautifulsoup模块
一、kindEditor编辑器

　　此编辑的使用官方文档介绍地址为：http://kindeditor.net/doc.php，具体的应用实例如下：

1、下载并配置

　　在官方下载好编辑器文件后，将其放在静态文件夹下，并按如下实例引用其文件，同时需要引用jquery文件：
<script src="/static/kindeditor/kindeditor-all-min.js"></script>
2、编辑器展示

　　编辑器的使用必须要与一个<textarea>标签绑定，具体见如下实例：
<textarea name="content" id="editor_id" cols="30" rows="10"></textarea> <script> // 富文本编辑器 KindEditor.ready(function (K) { window.editor = K.create('#editor_id', { "99%", height: "600px", resizeType: 1, //高度可调，宽度不可调 uploadJson: "/upload_img/", //指定上传图片等文件的时要执行的函数（用于保存图片，并返回） extraFileUploadParams: { //设置上传文件时需要的参数 csrfmiddlewaretoken: $("[name='csrfmiddlewaretoken']").val() }, filePostName: "article_img" //设置上传图片的文件名 }); }); </script>
3、上传图片等文件设置

　　如2小节中，我们展示了编辑器以后，其实并不能直接进行上传图片等功能，需要按如上设置文件上传需要的参数，其中处理上传文件的函数实例如下：
from s7_cnblog import settings import os def upload_img(request): #用于处理编辑器上传的文件 print(request.FILES) fileObj=request.FILES.get("article_img") path=os.path.join(settings.MEDIA_ROOT,"article_img",fileObj.name) with open(path,"wb") as f: for line in fileObj: f.write(line) #服务器将图片保存成功后，以如下形式将json形式返回给编辑器用于显示 res={ "error":0, "url":"/media/article_img/"+fileObj.name } return HttpResponse(json.dumps(res))
　　注意通过文本编辑器编辑的文本（即textarea标签的文本）值是标签字符串！

二、beautifulsoup模块简单实用

　　如上，我们文本编辑器提交的内容是一堆标签字符串，同样我们通过爬虫手段获得的内容也是网页的标签的字符串，如何能获得我们想要关于标签中有效文本或者内容，我们就需要用到beautifulsoup模块，它是python的一个标准库，主要应用方向也是爬虫。

1、beautifulsoup安装
pip install beautifulsoup4
2、使用介绍

　　将一段文档传入BeautifulSoup 的构造方法,就能得到一个文档的对象, 可以传入一段字符串或一个文件句柄，如下例：
from bs4 import BeautifulSoup soup = BeautifulSoup(open("index.html")) soup = BeautifulSoup("<html>data</html>")
　　上述我们并没有为Beautiful Soup指定解析器，然后，Beautiful Soup会选择最合适的解析器来解析这段文档,如果手动指定解析器那么Beautiful Soup会选择指定的解析器来解析文档。如下例我们指定的html.parser解析器为python默认的解析器。
from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser')
　　其他常用解析器如下介绍，除python默认解析器外，其他解析器使用前需要安装：

3、节点对象
html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> The Dormouse's story Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. ... """
from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser')
　　以下所有实例，除有说明外，均已上述的soup文档对象为例。

　　Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为四种，分别为：Tag , NavigableString , BeautifulSoup , Comment ，介绍如下：

（一）tag对象

　　通俗点讲就是 HTML 中的一个个标签，Tag 对象与XML或HTML原生文档中的tag相同。

1、获取标签对象

　　我们可以利用 soup加标签名轻松地获取这些标签的内容，注意，它查找的是在所有内容中的第一个符合要求的标签。如下例：
#点标签名的方式获取符合条件的第一个标签对象 soup=BeautifulSoup(html_doc,"html.parser") print(soup.head) #结果：<head><title>The Dormouse's story</title></head> print(soup.head.title) #结果：<title>The Dormouse's story</title>
　　此外，可以通过find_all()方法获得所有同一标签名字的标签对象，结果为一个列表，实例如下：
print(soup.find_all("a")) ''' 结果为： [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] '''
2、获取标签属性

　　通过tag[属性名]方式可以获得对应属性值，class属性值以列表形式显示，通过tag.attrs方式可以获得标签所有的属性值，结果以键值对的形式的显示，实例如下：
soup=BeautifulSoup(html_doc,"html.parser") tag=soup.p #标签对象： #The Dormouse's story print(tag.name) #获取标签名字：p print(tag["class"]) #获取标签class属性值，因为class属性值有可能有多个，所以用列表形式显示：['title'] print(tag.attrs) #获取标签中的所有的属性值，以键值对的形式显示：{'class': ['title'], 'id': 'info'}
3、标签属性操作

　　标签属性可以被添加、修改和删除，其操作方式和字典的操作方式是一样的，如下例：
soup=BeautifulSoup(html_doc,"html.parser") tag=soup.p print(tag) #标签对象： #The Dormouse's story tag["class"]="test" tag["id"]=1 print(tag) #若原来有此属性，则重新赋值，若没有则重新添加：The Dormouse's story
　　属性删除实例如下：
del tag["class"] del tag["id"] print(tag) #结果为：The Dormouse's story
（二）NavigableString(字符串)

　　字符串对象指的就是标签对象中文本，那我们如何获得标签对象中的文本内容呢？tag.string方法就可以很容易获得，实例如下：
soup=BeautifulSoup(html_doc,"html.parser") tag=soup.p print(tag) #this is test!The Dormouse's story print(tag.string) #None print(tag.b.string) #The Dormouse's story print(tag.text) #this is test!The Dormouse's story
　　如上实例，我们可以看出：通过tag.string方法只能得到当前标签对象内的字符串对象，若当前标签对象内还有其他标签，则不能得到相应的字符串，text方法可以弥补这一点，它可以获得标签或者标签字符串内的所有文本内容。

　　可以通过replace_with方法替换标签对象内的字符串内容，实例如下：
tag.b.string.replace_with("hello world") print(tag) #this is test!hello world
（三）comment对象

　　当标签对象中的文本是被注释的状态，通过tag.string也可得到注释掉的文本，此时我们并看不出此文本是否为被注释，但是通过打印其类型，我们发现被注释的文本不再是字符串类型，而是comment类型。实例如下：
html_doc='<a href="http://example.com/elsie" class="sister" id="link1"></a>' soup = BeautifulSoup(html_doc, 'html.parser') print(soup.a.string) # Elsie print(type(soup.a.string)) #<class 'bs4.element.Comment'>
　　为了避免通过tag.string方法获得这种字符串对象和comment对象的混淆而带来的不必要的麻烦，一般我们可以通过做判断进行区分，如下实例：
if type(soup.a.string)==bs4.element.Comment: print soup.a.string
（四）beautifulsoup对象

　　beautifulsoup对象就是我们上述反复实例出的soup对象，表示的是一个文档的全部内容.大部分时候,可以把它当作 Tag 对象，是一个特殊的 Tag。这里不做详细的介绍了。

三、beautifulsoup的遍历文档树

1、子节点

　　一个Tag可能包含多个字符串或其它的Tag,这些都是这个Tag的子节点.Beautiful Soup提供了许多操作和遍历子节点的属性。

（1）.contens：tag的 .contents 属性可以将tag的子节点以列表的方式输出，实例如下：
soup=BeautifulSoup(html_doc,"html.parser") tag=soup.p print(tag) #this is test!The Dormouse's story print(tag.contents) #['this is test!', The Dormouse's story]
　　字符串没有 .contents 属性,因为字符串没有子节点，实例如下：
print(tag.contents[1].contents) #["The Dormouse's story"] print(tag.contents[0].contents) #报错
（2）.children:与contents方法不同的是，.children方法得到的是一个list生成器，需要通过循环取到它的子节点，如下例：
soup=BeautifulSoup(html_doc,"html.parser") tag=soup.p print(tag) #this is test!The Dormouse's story print(tag.children) #<list_iterator object at 0x000001410F2305C0> for child in tag.children: print(child) ''' this is test! The Dormouse's story '''
（3）.descendants

　　与.contents 和 .children 不同的是，如上例，上述二者属性仅包含tag的直接子节点，即儿子节点；.descendants则包含tag对象的所有的子节点，儿子和孙子，即所有后代节点，结果也是一个迭代器，可以通过循环得到所有的后代节点，实例如下：
soup=BeautifulSoup(html_doc,"html.parser") tag=soup.p print(tag) #this is test!The Dormouse's story print(tag.descendants) #<generator object descendants at 0x000001AA382130F8> for descendant in tag.descendants: print(descendant) ''' this is test! The Dormouse's story The Dormouse's story '''
2、父节点

　　.parent （获得直接父节点）

　　.parents（递归获得所有父辈节点），结果为生成器，通过循环操作

3、兄弟节点

　　兄弟节点可以理解为和本节点处在统一级的节点，.next_sibling 属性获取了该节点的下一个兄弟节点，.previous_sibling 则与之相反，如果节点不存在，则返回 None。注意：实际文档中的tag的 .next_sibling 和 .previous_sibling 属性通常是字符串或空白，因为空白或者换行也可以被视作一个节点，所以得到的结果可能是空白或者换行。

　　通过 .next_siblings 和 .previous_siblings 属性可以对当前节点的兄弟节点迭代输出。

4、前后节点

　　与 .next_sibling .previous_sibling 不同，它并不是针对于兄弟节点，而是在所有节点，不分层次，上个节点属性：.next_element 下个节点属性： .previous_element ，通过 .next_elements 和 .previous_elements 的迭代器就可以向前或向后访问文档的解析内容,就好像文档正在被解析一样

四、beautifulsoup的搜索文档树

1、find_all( name , attrs , recursive , string , **kwargs )

　　find_all() 方法搜索当前tag的所有tag子节点,并判断是否符合过滤器的条件:

（1）name参数

（2）keyword参数

（3）text参数

（4）limit参数

（5）recursive参数

2、find( name , attrs , recursive , string , **kwargs )

　　find_all() 方法将返回文档中符合条件的所有tag,尽管有时候我们只想得到一个结果.比如文档中只有一个<body>标签,那么使用 find_all() 方法来查找<body>标签就不太合适, 使用 find_all 方法并设置 limit=1 参数不如直接使用 find() 方法

五、beautifulsoup的css选择器

　　我们在写 CSS 时，标签名不加任何修饰，类名前加点，id名前加 #，在这里我们也可以利用类似的方法来筛选元素，用到的方法是 soup.select()，返回类型是 list，如下实例介绍：
查看全文

相关阅读:
oracle oltp系统索引使用监控
 lock检查
 关于报错:django.core.exceptions.ImproperlyConfigured: mysqlclient 1.3.3 or newer is required; you have 0.7.11.None
Django 统计文章阅读量(或热度排名)
HighCharts数据可视化
 KindEditor编辑器
 [SDOI2008] 洞穴勘测
 [Luogu3768]简单的数学题
 [BZOJ4916]神犇和蒟蒻
 杜教筛小结

原文地址：https://www.cnblogs.com/seven-007/p/8135897.html

最新文章
hadoop学习（二）hadoop集群的启动
 hadoop学习（一）环境的搭建
 比较好的源码下载地址
 JNI小记
 swing设置观感
 F
线性DP的学习
 G
E
C