zoukankan html css js c++ java

python 抓取页面数据返回 “请打开浏览器的javascript，然后刷新浏览器 ”

之前用的python抓取页面数据用的是：

1 url = "http://xxxxxx"
2 res_text = json.loads(urllib2.urlopen(urllib2.Request(server_url_text)).read())
3 print('res_text')

后来再调用看页面返回的是 “请打开浏览器的javascript，然后刷新浏览器 ”，查看后是加了cookie，cookie加进去后，返回正常

url = "http://xxxxxx"
headers = {'User-Agent': xxx", 'Cookie':xxx",}
res_text = json.loads(urllib2.urlopen(urllib2.Request(server_url_text,headers=headers)).read())
print('res_text')

但是 cookie经常变，于是换了种方式，使用webdriver获取页面数据

 1 from selenium import webdriver
 2 import time,json
 3 
 4 driver = webdriver.Chrome()
 5 driver.get('xxxx')
 6 time.sleep(7)
 7 res = driver.find_element_by_xpath('xxxxxx')
 8 s = json.loads(res.text)
 9 driver.close()
10 
11 print(s,type(s))

注意：如上方法使用的是谷歌浏览器，需要提前安装谷歌浏览器和对应的驱动 chromedriver；也可使用其他浏览器

查看全文

相关阅读:
2020软件工程最后一次作业
 常用的10种算法
 图
 赫夫曼编码
 哈希表（散列）
查找算法
 排序算法
 递归
 栈
 软件工程最后一次作业

原文地址：https://www.cnblogs.com/whycai/p/12838501.html

Copyright © 2011-2022 走看看