寒假学习报告10 - 走看看

zoukankan html css js c++ java

寒假学习报告10
今天继续研究了爬虫

遇到了一些问题，各种查阅资料才得以解决。

response.xpath.extract()爬取的值里面含有，
如何去掉呢？需要normalize-space()
比如:
response.xpath('//div[@class=""]/text()').extract()
使用normalize-space()后：
response.xpath('normalize-space(//div[@class=""]/text())').extract()
```
在xpath的外面还可以用
name = name.replace('
', '').replace('	', '').replace(' ', '')
```
```
name = name.replace('
', '')
```
```
name = name.replace('	', '')
```
```
name = name.replace(' ', '')
来去除
	空格
```
scrapy 爬虫爬到<div>标签里面包含<p>标签
我想爬取div标签中的所有的内容，但是里面有p标签，
直接response.xpath('//div[@class=""]/text()').extract()的话是没有<div>里的<p>中的内容的，
需要response.xpath('//div[@class=""]/descendant::text()').extract()

scrapy中parse函数向其他函数传参
def parse(self, response): yield scrapy.Request(url,callback=self.next,meta={'rname':'2'}) def next(self,response): print(response.meta['rname'])
然后又把上一个程序优化了一下
查看全文

相关阅读:
matplotlib 画图
 Mac anzhuangxgboost
scala _ parameter
cv 验证
 groupie
pandas map, apply, applymap区别
 画图
 xgboost dmatrix中的 weight的重要性
 自然语言处理的训练范式
 java-处理大容量文本文件，行内分格符为TAB的方法

原文地址：https://www.cnblogs.com/baimafeima/p/12292978.html

热门文章
数学小知识点
 Blob检测
 梯度，方向梯度，拉格朗日乘数法
 视频稳像
 特征值与特征向量
 线性变换
 色彩理论
 排序算法
 同态滤波
 特征选择

Copyright © 2011-2022 走看看