zoukankan      html  css  js  c++  java
  • Selenium 获取动态js的网页

    Selenium基于webkit实现爬虫功能

    http://www.cnblogs.com/luxiaojun/p/6144748.html

    https://www.cnblogs.com/chenqingyang/p/3772673.html

    现在headless chrome替代 PhantomJS 

    https://zhuanlan.zhihu.com/p/27100187

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    import time
    import io
    
    dcap = dict(DesiredCapabilities.PHANTOMJS)  #设置userAgent
    dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 ")
     
    obj = webdriver.PhantomJS(executable_path='C:Program Files (x86)Microsoft Visual StudioSharedPython36_64Scriptsphantomjs.exe',desired_capabilities=dcap) #加载网址
    obj.get('http://chart.icaile.com/sd11x5.php')#打开网址
    
    
    #time.sleep(10)
    pageSource = obj.page_source
    print(pageSource)
    
    obj.quit() 
    

      

    获取的网页内容后,可以使用beautifulsoup来分析

    https://cuiqingcai.com/1319.html

    直接获取表格的文本

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    import time
    import io
    
    dcap = dict(DesiredCapabilities.PHANTOMJS)  #设置userAgent
    #dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 ")
     
    obj = webdriver.PhantomJS(executable_path='C:Program Files (x86)Microsoft Visual StudioSharedPython36_64Scriptsphantomjs.exe',desired_capabilities=dcap) #加载网址
    obj.get('http://chart.icaile.com/sd11x5.php')#打开网址
    
    
    text = obj.find_element_by_id("fixedtable").text
    
    print(text)
    
    obj.quit()  
    

      

    import time
    import io
    import re
    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
    
    
    dcap = dict(DesiredCapabilities.PHANTOMJS)  #设置userAgent
    dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 ")
     
    obj = webdriver.PhantomJS(executable_path='C:Program Files (x86)Microsoft Visual StudioSharedPython36_64Scriptsphantomjs.exe',desired_capabilities=dcap) #加载网址
    obj.get('http://chart.icaile.com/sd11x5.php')#打开网址
    
    
    text = obj.find_element_by_id("fixedtable").text
    #time.sleep(10)
    #pageSource = obj.page_source
    #print(pageSource)
    
    #print(text)
    
    
    page = obj.page_source
        
    url_context = re.findall('href="(.*?)"',page,re.S)
    url_list = []
    for url in url_context:    
        if 'http'in url:
            print(url)
    
    obj.quit()  
    

      

  • 相关阅读:
    Spoj 2798 Qtree3
    [HAOI2015]树上操作
    Grass Planting
    [ZJOI2008] 树的统计Count
    Spoj375 Qtree--树链剖分
    [HNOI2012]永无乡
    雨天的尾巴
    temp
    线段树动态开点之逆序对
    线段树动态开点
  • 原文地址:https://www.cnblogs.com/coolyylu/p/8277439.html
Copyright © 2011-2022 走看看