zoukankan      html  css  js  c++  java
  • Python之爬虫-京东商品

    Python之爬虫-京东商品

    #!/usr/bin/env python
    # coding: utf-8
    
    
    from selenium import webdriver
    from selenium.webdriver import ActionChains
    from selenium.webdriver.common.by import By  # 按照什么方式查找,By.ID,By.CSS_SELECTOR
    from selenium.webdriver.common.keys import Keys  # 键盘按键操作
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.wait import WebDriverWait  # 等待页面加载某些元素
    import time
    
    
    def get_goods(driver):
        try:
            goods = driver.find_elements_by_class_name('gl-item')
    
            for good in goods:
                detail_url = good.find_element_by_tag_name('a').get_attribute('href')
    
                p_name = good.find_element_by_css_selector('.p-name em').text.replace('
    ', '')
                price = good.find_element_by_css_selector('.p-price i').text
                p_commit = good.find_element_by_css_selector('.p-commit a').text
    
                msg = '''
                商品 : %s
                链接 : %s
                价钱 :%s
                评论 :%s
                ''' % (p_name, detail_url, price, p_commit)
    
                print(msg, end='
    
    ')
    
            button = driver.find_element_by_partial_link_text('下一页')
            button.click()
            time.sleep(1)
            get_goods(driver)
        except Exception:
            pass
    
    
    def spider(url, keyword):
        driver = webdriver.Chrome()
        driver.get(url)
        driver.implicitly_wait(3)  # 使用隐式等待
        try:
            input_tag = driver.find_element_by_id('key')
            input_tag.send_keys(keyword)
            input_tag.send_keys(Keys.ENTER)
            get_goods(driver)
        finally:
            driver.close()
    
    
    if __name__ == '__main__':
        spider('https://www.jd.com/', keyword='华为P30')
    
  • 相关阅读:
    拓扑排序,bitset~[JSOI2015]最小表示
    字符串算法~KMP
    校内团队训练赛2
    校内团队训练赛
    CodeForces
    莫队算法基础与练习
    lintcode-452-删除链表中的元素
    lintcode-451-两两交换链表中的节点
    lintcode-450-K组翻转链表
    lintcode-445-余弦相似度
  • 原文地址:https://www.cnblogs.com/nickchen121/p/10825876.html
Copyright © 2011-2022 走看看