day03
-
selenium 介绍
Selenium 是一个用于Web应用程序测试的工具。Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。支持的浏览器包括IE(7, 8, 9, 10, 11),[Mozilla Firefox](https://baike.baidu.com/item/Mozilla Firefox/3504923),Safari,Google Chrome,Opera等。
-
驱动对应表
chromedriver版本 支持的Chrome版本 v2.46 v71-73 v2.45 v70-72 v2.44 v69-71 v2.43 v69-71 v2.42 v68-70 v2.41 v67-69 v2.40 v66-68 v2.39 v66-68 v2.38 v65-67 v2.37 v64-66 v2.36 v63-65 v2.35 v62-64 v2.34 v61-63 v2.33 v60-62 v2.32 v59-61 v2.31 v58-60 v2.30 v58-60 v2.29 v56-58 v2.28 v55-57 v2.27 v54-56 v2.26 v53-55 v2.25 v53-55 v2.24 v52-54 v2.23 v51-53 v2.22 v49-52 v2.21 v46-50 v2.20 v43-48 v2.19 v43-47 v2.18 v43-46 v2.17 v42-43 v2.13 v42-45 v2.15 v40-43 v2.14 v39-42 v2.13 v38-41 v2.12 v36-40 v2.11 v36-40 v2.10 v33-36 v2.9 v31-34 v2.8 v30-33 v2.7 v30-33 v2.6 v29-32 v2.5 v29-32 v2.4 v29-32 -
安装谷歌驱动 找到对应的版本
-
元素定位
元素名称 webdriver-api id find_element_by_id() name find_element_by_name() class name find_element_by_class_name() tag name find_element_by_tag_name() link text find_element_by_link_text() partial link text find_element_by_partial_link_text() xpath find_element_by_xpath() css selector find_element_by_css_selector() -
元素操作方式
方法 说明 clear 清除标签内容 send_keys 模拟按键输入 click 点击 submit 提交表单 back 向后 forward 向前 maximize_window 全屏 -
开胃菜
# 百度搜索老男孩 from selenium import webdriver # 打开浏览器 b = webdriver.Chrome() # 请求百度 b.get('https://www.baidu.com') # 找到百度的input输入框的标识符 id:kw ele = b.find_element_by_id('kw') # 清除输入框信息 ele.clear() # 输入 老男孩 ele.send_keys('老男孩') # 查找点击按钮节点 su = b.find_element_by_id('su') # 点击按钮 su.click()
-
爬取京东商城
from selenium import webdriver from selenium.webdriver.common.keys import Keys # 键盘按键操作 import time def get_goods(driver): try: goods = driver.find_elements_by_class_name('gl-item') for good in goods: detail_url = good.find_element_by_tag_name('a').get_attribute('href') p_name = good.find_element_by_css_selector('.p-name em').text.replace(' ','') price = good.find_element_by_css_selector('.p-price i').text p_commit = good.find_element_by_css_selector('.p-commit a').text msg = ''' 商品 : %s 链接 : %s 价钱 :%s 评论 :%s ''' % (p_name, detail_url, price, p_commit) print(msg, end=' ') button = driver.find_element_by_partial_link_text('下一页') button.click() time.sleep(1) get_goods(driver) except Exception: pass def spider(url, keyword): driver = webdriver.Chrome() driver.get(url) driver.implicitly_wait(3) # 使用隐式等待 try: input_tag = driver.find_element_by_id('key') input_tag.send_keys(keyword) input_tag.send_keys(Keys.ENTER) get_goods(driver) finally: driver.close() if __name__ == '__main__': spider('https://www.jd.com/', keyword='华为P30')
-
爬取QQ空间动态
import time from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdriver.Chrome() driver.get('https://i.qq.com/') driver.switch_to.frame('login_frame') driver.find_element_by_id("switcher_plogin").click() user = driver.find_element_by_id('u') user.send_keys('') # QQ号 pwd = driver.find_element_by_id('p') pwd.send_keys('xxxxxxx') # 密码 submit = driver.find_element_by_id('login_button') submit.click() time.sleep(2) for i in range(50): driver.execute_script("window.scrollTo(0,500)") time.sleep(5) li_list = driver.find_elements_by_class_name("f-single") print(len(li_list)) for li in li_list: print(li.text) print('*' * 60)