zoukankan      html  css  js  c++  java
  • 爬虫 selenium

    chromedriver下载

    谷歌浏览器驱动下载地址:http://chromedriver.storage.googleapis.com/index.html
    http://npm.taobao.org/mirrors/chromedriver/
    
    下载的驱动程序必须和浏览器的版本统一,可以根据http://blog.csdn.net/huilan_same/article/details/51896672中提供的版本映射表进行对应
    View Code

    开启浏览器的前端的爬虫

    from selenium import webdriver
    from time import sleep
    bro
    = webdriver.Chrome(executable_path=r'D:爬虫存储chromedriver.exe')
    bro.
    get(url='https://www.baidu.com/')
    sleep(
    2) bro.find_element_by_id('kw').send_keys('python') sleep(1) bro.find_element_by_id('su').click() time.sleep(2)
    with open(
    'baidu.html', 'w', encoding='utf8') as f: f.write(bro.page_source)
    bro.quit()

    不开启浏览器的前端的爬虫

    from selenium.webdriver.chrome.options import Options
    chrome_options
    = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu')
    url
    = 'https://movie.douban.com/typerank?type_name=%E6%83%8A%E6%82%9A&type=19&interval_id=100:90&action=' bro = webdriver.Chrome(executable_path=r'D:爬虫存储chromedriver.exe', chrome_options=chrome_options)
    bro.
    get(url)
    bro.execute_script(
    'window.scrollTo(0,document.body.scrollHeight)') for i in range(2): sleep(1) bro.execute_script('window.scrollTo(0,document.body.scrollHeight)') sleep(5)
    with open(
    'douban.html', 'w', encoding='utf8') as f: f.write(bro.page_source)
    bro.quit()

    获取浏览器的实时图片和设置浏览器的大小

    from selenium.webdriver.chrome.options import Options
    
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    
    url = r'www.baidu.com'
    
    bro = webdriver.Chrome(executable_path=r'D:爬虫存储chromedriver.exe', chrome_options=chrome_options)
    
    bro.set_window_size(7680, 4320)
    bro.get(url)
    sleep(30)
    data = bro.get_screenshot_as_png()
    
    with open('1.png', 'wb') as f:
        f.write(data)
    
    bro.quit()

    在碰到iframe的情况下, 使用选择id等都会找不到, 解决方法

    bro.switch_to_frame('login_frame')
    bro.find_element_by_id('switcher_plogin').click()
    bro.find_element_by_id('u').send_keys('1132300949')
    bro.find_element_by_id('login_button').click()
    page_text = bro.page_source
  • 相关阅读:
    解决vmware Invalid memory setting (sched.mem.min)
    PostgreSQL教程
    rpm包安装过程中依赖问题“libc.so.6 is needed by XXX”解决方法
    使用厂商MIB库查找设备OID值 并实施监控的方法
    【交换机】我司交换机上常用的一些MIB以及对应的OID说明
    CentOS6.8-minimal安装gnome桌面 安装NVC远程桌面连接
    LINUX新建和增加SWAP分区
    Caused by: java.lang.ClassNotFoundException: org.aspectj.weaver.reflect.ReflectionWorld$ReflectionWo
    类与对象
    Volley框架源代码分析
  • 原文地址:https://www.cnblogs.com/NachoLau/p/10453871.html
Copyright © 2011-2022 走看看