zoukankan html css js c++ java

python selenium使用

selenium是一款用于web程序的测试工具,它能直接调用控制浏览器,就像用户操作浏览器一样,多用于爬虫等.

官方文档: https://selenium-python.readthedocs.io/index.html

一安装

# 安装selenium
pip install selenium

# 安装火狐firefoxdriver或者谷歌chromedriver
# 查看谷歌浏览器版本信息 chrome://version/ 
# 根据版本下载驱动 http://chromedriver.storage.googleapis.com/index.html
# windows下载win32的驱动
# 将chromedriver.exe 放入安装的selenium库下
from selenium import webdriver
chrome_driver=r"C:Python36Libsite-packagesseleniumwebdriverchromechromedriver.exe" 
driver=webdriver.Chrome(executable_path=chrome_driver)

二简单使用firefoxdriver爬取房源信息

#coding: utf-8
from selenium import webdriver
def forcitiurl(xq_url):
    driver = webdriver.Firefox()  # 创建浏览器对象
    driver.get(xq_url)  # 请求要访问的url
    tmp = driver.find_element_by_xpath("/html/body/div[2]/div[3]/div[1]").text  # 获取拿到的html的信息
    print(tmp)
    driver.quit()  # 关闭浏览器
   
forcitiurl("https://anqing.anjuke.com/community/view/861940")

三简单使用chromedriver爬取房源信息

3.1 有界面运行

from selenium import webdriver

def forcitiurl(xq_url):
    chrome_driver = r"C:Python36Libsite-packagesseleniumwebdriverchromechromedriver.exe"
    driver = webdriver.Chrome(executable_path=chrome_driver)
    driver.get(xq_url)  # 请求要访问的url 
    tmp = driver.find_element_by_xpath("/html/body/div[2]/div[3]/div[1]").text # 获取拿到的html的信息 print(tmp) 
    driver.quit() # 关闭浏览器 
forcitiurl("https://anqing.anjuke.com/community/view/861940")

3.2 无界面运行

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def forcitiurl(xq_url):
    chrome_opt = Options()  # 创建参数设置对象
    chrome_opt.add_argument('--headless')  # 无界面化
    chrome_opt.add_argument('--disable-gpu')  # 配合上面的无界面化
    chrome_opt.add_argument('--window-size=1366,768')  # 设置窗口大小
    chrome_driver = r"C:Python36Libsite-packagesseleniumwebdriverchromechromedriver.exe"
    driver = webdriver.Chrome(executable_path=chrome_driver,chrome_options=chrome_opt)
    driver.get(xq_url)  # 请求要访问的url
    tmp = driver.find_element_by_xpath("/html/body/div[2]/div[3]/div[1]").text # 获取拿到的html的信息 print(tmp)
    print(tmp)
    driver.quit() # 关闭浏览器
forcitiurl("https://anqing.anjuke.com/community/view/861940")

四 Driver操作

4.1 常见操作

driver.get(url): get请求当前url
driver.close(): 关闭浏览器当前窗口
driver.quit(): 关闭浏览器全部页面
driver.refresh(): 刷新当前页面
driver.title: 刷新当前页面
driver.page_source: 获取当前页渲染后的源代码
driver.current_url: 获取当前页的url
driver.window_handles: 获取当前会话中所有窗口的句柄

4.2 查询元素

方法	作用
find_element_by_xpath()	通过`Xpath`查找
find_element_by_class_name()	通过`class属性`查找
find_element_by_css_selector()	通过`css选择器`查找
find_element_by_id()	通过`id`查找
find_element_by_link_text()	通过`链接文本`查找
ind_element_by_name()	通过`name属性`进行查找
find_element_by_partial_link_text()	通过`链接文本的部分匹配`查找
find_element_by_tag_name()	通过`标签名`查找

4.3 操作cookie

add_cookie(cookie_dict) : 给当前会话添加一个cookie。
- cookie_dict: 一个字典对象，必须要有”name”和”value”两个键，可选的键有：“path”, “domain”, “secure”, “expiry”.
- driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’})
- driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’, ‘path’ : ‘/’})
- driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’, ‘path’ : ‘/’, ‘secure’:True})

get_cookie(name): 按name获取单个Cookie，没有则返回None。

get_cookies(): 获取所有Cookie，返回的是一组字典。

delete_all_cookies(): 删除所有Cookies。

delete_cookie(name): 按name删除指定cookie。

查看全文

相关阅读:
CSS cursor 属性笔记
 sql 不等于 <>
去掉时间中的时分秒
 ref 和 out区别
 关于闭包（未完待续）
面向对象——多态（摘）
SQL Service 数据库基本操作视图触发器游标存储过程
 遍历winform 页面上所有的textbox控价并赋值string.Empty
关于Html 和Xml 区别（备忘）
python之面向对象进阶

原文地址：https://www.cnblogs.com/aaronthon/p/12736391.html

python selenium使用

一 安装

二 简单使用firefoxdriver爬取房源信息

三 简单使用chromedriver爬取房源信息

3.1 有界面运行

3.2 无界面运行

四 Driver操作

4.1 常见操作

4.2 查询元素

一安装

二简单使用firefoxdriver爬取房源信息

三简单使用chromedriver爬取房源信息