zoukankan      html  css  js  c++  java
  • 51job_selenium测试2

    Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门

    https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6EmUbbW&id=564564604865

    # -*- coding: utf-8 -*-
    """
    Spyder Editor
    
    This is a temporary script file.
    """
    
    import requests,bs4,openpyxl,time,selenium
    from openpyxl.cell import get_column_letter,column_index_from_string
    from selenium import webdriver
    excelName="51job.xlsx"
    sheetName="Sheet1"
    wb1=openpyxl.load_workbook(excelName)
    sheet=wb1.get_sheet_by_name(sheetName)
    start=1
    
    charset="gb2312"
    site="http://jobs.51job.com/all/co198308.html"
    browser=webdriver.Firefox()
    browser.get(site)
    linkElem=browser.find_element_by_link_text("下一页")
    linkElem.click()
    #elem = browser.find_element_by_class_name('el')
    #返回标签的值
    #elem.text
    #elems = browser.find_elements_by_class_name('el')
    elem=browser.find_elements_by_class_name('el')
    div1=elem[0].text
    div2=elem[1].text
    
    
    
    #每个网站爬取相应数据
    def Craw(site):
         
        res=requests.get(site)
        res.encoding = charset
        soup1=bs4.BeautifulSoup(res.text,"lxml")
        div=soup1.select('.el')
        len_div=len(div)
        for i in range(len_div):
            #print ("i:",i)
            content=div[i].getText()
            content_list=content.split('
    ')
             
            name=content_list[1]
            #print ("name:",name)
            education=content_list[2]
            #print ("education:",education)
            position=content_list[3]
            #print ("position:",position)
            salary=content_list[4]
            #print ("salary:",salary)
            date=content_list[5]
            #print ("date:",date)
        
            sheet['A'+str(i+2)].value=name
            sheet['B'+str(i+2)].value=education
            sheet['C'+str(i+2)].value=position
            sheet['D'+str(i+2)].value=salary
            sheet['E'+str(i+2)].value=date
    
    ''' 
    Craw(site)       
    wb1.save(excelName)
        '''
    

      

    Finding Elements on the Page

    WebDriver objects have quite a few methods for finding elements on a page. They are divided into the find_element_* and find_elements_* methods. Thefind_element_* methods return a single WebElement object, representing the first element on the page that matches your query. The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.

    Table 11-3 shows several examples of find_element_* and find_elements_* methods being called on a WebDriver object that’s stored in the variable browser.

    Table 11-3. Selenium’s WebDriver Methods for Finding Elements

     

    Method name

    WebElement object/list returned

    browser.find_element_by_class_name(name)
    browser.find_elements_by_class_name(name)

    Elements that use the CSS class name

    browser.find_element_by_css_selector(selector)
    browser.find_elements_by_css_selector(selector)

    Elements that match the CSS selector

    browser.find_element_by_id(id)
    browser.find_elements_by_id(id)

    Elements with a matching id attribute value

    browser.find_element_by_link_text(text)
    browser.find_elements_by_link_text(text)

    <a> elements that completely match the textprovided

    browser.find_element_by_partial_link_text(text)
    browser.find_elements_by_partial_link_text(text)

    <a> elements that contain the text provided

    browser.find_element_by_name(name)
    browser.find_elements_by_name(name)

    Elements with a matching name attribute value

    browser.find_element_by_tag_name(name)
    browser.find_elements_by_tag_name(name)

    Elements with a matching tag name (case insensitive; an <a> element is matched by 'a'and 'A')

    Except for the *_by_tag_name() methods, the arguments to all the methods are case sensitive. If no elements exist on the page that match what the method is looking for, the selenium module raises a NoSuchElement exception. If you do not want this exception to crash your program, add try and except statements to your code.

    Once you have the WebElement object, you can find out more about it by reading the attributes or calling the methods in Table 11-4.

    Table 11-4. WebElement Attributes and Methods

    Attribute or method

    Description

    tag_name

    The tag name, such as 'a' for an <a> element

    get_attribute(name)

    The value for the element’s name attribute

    text

    The text within the element, such as 'hello' in <span>hello</span>

    clear()

    For text field or text area elements, clears the text typed into it

    is_displayed()

    Returns True if the element is visible; otherwise returns False

    is_enabled()

    For input elements, returns True if the element is enabled; otherwise returns False

    is_selected()

    For checkbox or radio button elements, returns True if the element is selected; otherwise returns False

    location

    A dictionary with keys 'x' and 'y' for the position of the element in the page

    Table 11-5. Commonly Used Variables in the selenium.webdriver.common.keysModule

    Attributes

    Meanings

    Keys.DOWNKeys.UPKeys.LEFTKeys.RIGHT

    The keyboard arrow keys

    Keys.ENTERKeys.RETURN

    The ENTER and RETURN keys

    Keys.HOMEKeys.ENDKeys.PAGE_DOWN,Keys.PAGE_UP

    The homeendpagedown, and pageup keys

    Keys.ESCAPEKeys.BACK_SPACEKeys.DELETE

    The ESC, BACKSPACE, and DELETE keys

    Keys.F1Keys.F2,..., Keys.F12

    The F1 to F12 keys at the top of the keyboard

    Keys.TAB

    The TAB key

  • 相关阅读:
    _mysql.c(42) : fatal error C1083: Cannot open include file: 'config-win.h':问题的解决
    pycharm 插件的升级
    机器学习
    Python三大神器
    印记中文
    Emacs, Nano, or Vim 编辑器“三剑客”
    码云-中国的github
    代码质量管控的四个阶段
    <<创新之路>> 纪录片观后感
    API (Application Programming Interface) 文档大全
  • 原文地址:https://www.cnblogs.com/webRobot/p/5302439.html
Copyright © 2011-2022 走看看