zoukankan html css js c++ java

Python+Selenium学习--分页处理

场景

我们在测试一个web 应用时，经常出现翻页的情况，下面介绍翻页场景

代码

#!/usr/bin/env python
# -*- codinfg:utf-8 -*-
'''
@author: Jeff LEE
@file: 翻页.py
@time: 2018-09-26 11:14
@desc:
'''
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = webdriver.Firefox()
#添加智能等待
driver.implicitly_wait(10)

driver.get('https://www.baidu.com/')
driver.find_element_by_id('kw').send_keys('uniquefu')

driver.find_element_by_id('su').click()

page = driver.find_element_by_id('page')
pages = page.find_elements_by_tag_name('a')  #查找所有翻页跳转链接
time.sleep(5)

js = 'document.documentElement.scrollTop=10000'
total = 0  #页面数
is_next_page = True  #存在下一页
page_num = 0   #要点击的页面号

#往后翻页
while page_num <10:  #也可以通过is_next_page进行判断循环
        driver.execute_script(js)
        page_num = page_num + 1    #设置页号为下一页
        total = page_num   #记录页面数
        value=str(page_num)
        try:
            #查找指定页面
            xpath= "//div[@id='page']/a[contains(@href,'pn=%s')]" %value
            print(xpath)
            one_page = driver.find_element_by_xpath(xpath)
            one_page.click()
            time.sleep(1)
            driver.execute_script(js)
            time.sleep(1)

        except:
            print('no next page')
            is_next_page = False
            total = total - 1
            break

        #往前翻页
while total >= 0:

        driver.execute_script(js)

        try:
            total = total -1
            value = str(total)
            xpath = "//div[@id='page']/a[contains(@href,'pn=%s')]" % value
            print(xpath)
            one_page = driver.find_element_by_xpath(xpath)
            one_page.click()
            time.sleep(1)
            driver.execute_script(js)
            time.sleep(1)

        except:
            print('no pre page')
            break;

time.sleep(3)
driver.quit()

遇到问题：

selenium.common.exceptions.StaleElementReferenceException: Message: u'Element not found in the cache - perhaps the page has changed since it was looked up' ; Stacktrace:

即在cache中找不到元素，可能是在元素被找到之后页面变换了。这就说明，当前页面发生跳转之后，存在cache中的与这个页面相关的元素也被清空了，因此跳转后需要重新获取下一个页面翻页链接，然后点击。

备注：

对于类型博客类型的翻页不需要那么麻烦，因为翻页后页面链接不会发生变化

查看全文

相关阅读:
爬虫——Selenium与PhantomJS
爬虫——多线程糗事百科案例
 爬虫——json模块与jsonpath模块
 爬虫——使用BeautifulSoup4的爬虫
 爬虫——BeautifulSoup4解析器
 爬虫——爬取百度贴吧每个帖子里面的图片
 爬虫——爬虫中使用正则表达式
 爬虫——正则表达式re模块
 爬虫——requests模块
 爬虫——Handler处理器和自定义Opener

原文地址：https://www.cnblogs.com/uniquefu/p/9707149.html