zoukankan      html  css  js  c++  java
  • 51job_selenium测试2

    Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门

    https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6EmUbbW&id=564564604865

    # -*- coding: utf-8 -*-
    """
    Spyder Editor
    
    This is a temporary script file.
    """
    
    import requests,bs4,openpyxl,time,selenium
    from openpyxl.cell import get_column_letter,column_index_from_string
    from selenium import webdriver
    excelName="51job.xlsx"
    sheetName="Sheet1"
    wb1=openpyxl.load_workbook(excelName)
    sheet=wb1.get_sheet_by_name(sheetName)
    start=1
    
    charset="gb2312"
    site="http://jobs.51job.com/all/co198308.html"
    browser=webdriver.Firefox()
    browser.get(site)
    linkElem=browser.find_element_by_link_text("下一页")
    linkElem.click()
    #elem = browser.find_element_by_class_name('el')
    #返回标签的值
    #elem.text
    #elems = browser.find_elements_by_class_name('el')
    elem=browser.find_elements_by_class_name('el')
    div1=elem[0].text
    div2=elem[1].text
    
    
    
    #每个网站爬取相应数据
    def Craw(site):
         
        res=requests.get(site)
        res.encoding = charset
        soup1=bs4.BeautifulSoup(res.text,"lxml")
        div=soup1.select('.el')
        len_div=len(div)
        for i in range(len_div):
            #print ("i:",i)
            content=div[i].getText()
            content_list=content.split('
    ')
             
            name=content_list[1]
            #print ("name:",name)
            education=content_list[2]
            #print ("education:",education)
            position=content_list[3]
            #print ("position:",position)
            salary=content_list[4]
            #print ("salary:",salary)
            date=content_list[5]
            #print ("date:",date)
        
            sheet['A'+str(i+2)].value=name
            sheet['B'+str(i+2)].value=education
            sheet['C'+str(i+2)].value=position
            sheet['D'+str(i+2)].value=salary
            sheet['E'+str(i+2)].value=date
    
    ''' 
    Craw(site)       
    wb1.save(excelName)
        '''
    

      

    Finding Elements on the Page

    WebDriver objects have quite a few methods for finding elements on a page. They are divided into the find_element_* and find_elements_* methods. Thefind_element_* methods return a single WebElement object, representing the first element on the page that matches your query. The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.

    Table 11-3 shows several examples of find_element_* and find_elements_* methods being called on a WebDriver object that’s stored in the variable browser.

    Table 11-3. Selenium’s WebDriver Methods for Finding Elements

     

    Method name

    WebElement object/list returned

    browser.find_element_by_class_name(name)
    browser.find_elements_by_class_name(name)

    Elements that use the CSS class name

    browser.find_element_by_css_selector(selector)
    browser.find_elements_by_css_selector(selector)

    Elements that match the CSS selector

    browser.find_element_by_id(id)
    browser.find_elements_by_id(id)

    Elements with a matching id attribute value

    browser.find_element_by_link_text(text)
    browser.find_elements_by_link_text(text)

    <a> elements that completely match the textprovided

    browser.find_element_by_partial_link_text(text)
    browser.find_elements_by_partial_link_text(text)

    <a> elements that contain the text provided

    browser.find_element_by_name(name)
    browser.find_elements_by_name(name)

    Elements with a matching name attribute value

    browser.find_element_by_tag_name(name)
    browser.find_elements_by_tag_name(name)

    Elements with a matching tag name (case insensitive; an <a> element is matched by 'a'and 'A')

    Except for the *_by_tag_name() methods, the arguments to all the methods are case sensitive. If no elements exist on the page that match what the method is looking for, the selenium module raises a NoSuchElement exception. If you do not want this exception to crash your program, add try and except statements to your code.

    Once you have the WebElement object, you can find out more about it by reading the attributes or calling the methods in Table 11-4.

    Table 11-4. WebElement Attributes and Methods

    Attribute or method

    Description

    tag_name

    The tag name, such as 'a' for an <a> element

    get_attribute(name)

    The value for the element’s name attribute

    text

    The text within the element, such as 'hello' in <span>hello</span>

    clear()

    For text field or text area elements, clears the text typed into it

    is_displayed()

    Returns True if the element is visible; otherwise returns False

    is_enabled()

    For input elements, returns True if the element is enabled; otherwise returns False

    is_selected()

    For checkbox or radio button elements, returns True if the element is selected; otherwise returns False

    location

    A dictionary with keys 'x' and 'y' for the position of the element in the page

    Table 11-5. Commonly Used Variables in the selenium.webdriver.common.keysModule

    Attributes

    Meanings

    Keys.DOWNKeys.UPKeys.LEFTKeys.RIGHT

    The keyboard arrow keys

    Keys.ENTERKeys.RETURN

    The ENTER and RETURN keys

    Keys.HOMEKeys.ENDKeys.PAGE_DOWN,Keys.PAGE_UP

    The homeendpagedown, and pageup keys

    Keys.ESCAPEKeys.BACK_SPACEKeys.DELETE

    The ESC, BACKSPACE, and DELETE keys

    Keys.F1Keys.F2,..., Keys.F12

    The F1 to F12 keys at the top of the keyboard

    Keys.TAB

    The TAB key

  • 相关阅读:
    记一次 .NET 某智能服装智造系统 内存泄漏分析
    记一次 .NET 某化妆品 webapi 卡死分析
    记一次 .NET 某公交卡扣费系统 程序卡死分析
    去掉烦人的:要恢复页面吗?Chrome未正确关闭
    C#Excel转图片代码
    ArcEngine实现pagelayout中文本元素的属性对话框
    arcgis 模型版本问题最大
    Arcengine开发所遇错误解决方案(持续更新)
    ArcEngine IPageLayout 添加经纬网和公里网
    Arcengine的复制粘贴
  • 原文地址:https://www.cnblogs.com/webRobot/p/5302439.html
Copyright © 2011-2022 走看看