zoukankan      html  css  js  c++  java
  • phantomjs 抓取房产信息

        抓取https://sf.taobao.com/item_list.htm信息

        

        

        driver=webdriver.PhantomJS(service_args=['--ssl-protocol=any'])  
        or
        driver = webdriver.PhantomJS( service_args=['--ignore-ssl-errors=true'])
        cur_driver=webdriver.PhantomJS(service_args=['--ssl-protocol=any', '--load-images=false']) 
        
        
    service_args=['--load-images=false']
     

      抓取代码

    # coding=utf-8
    import os
    import re
    from selenium import webdriver
    # from selenium.common.exceptions import TimeoutException
    import selenium.webdriver.support.ui as ui
    import time
    from datetime import datetime
    from selenium.webdriver.common.action_chains import ActionChains
    import IniFile
    # from threading import Thread
    from pyquery import PyQuery as pq
    import LogFile
    import mongoDB
    import urllib
    
    class taobao(object):
        def __init__(self):
          
            self.driver = webdriver.PhantomJS(service_args=['--ssl-protocol=any'])
            self.driver.set_page_load_timeout(10)
            self.driver.maximize_window()
            self.url ='https://sf.taobao.com/item_list.htm'
    
    
    
        def scrapy_date(self):
            try:
                self.driver.get(self.url)
    
                selenium_html = self.driver.execute_script("return document.documentElement.outerHTML")
                doc = pq(selenium_html)
                Elements = doc('ul[class="sf-pai-item-list"]').find('li[class="pai-item pai-status-doing"]')
                for element in Elements.items():
                    priceinfo = element('div[class="info-section"]').find('p').text().encode('utf8').strip()
                    title = element('div[class="header-section "]').find('p').text().encode('utf8').strip()
                    print title
                    print priceinfo
                    print '--------------------------------------------------------------------------------'
    
    
            except Exception, e:
                print e.message
            finally:
                pass
    
    
    obj = taobao()
    obj.scrapy_date()

       抓取结果

  • 相关阅读:
    web前端工程师
    java工程师
    原因原来默认预检测会检测是否存在多选框
    软件测试&安全测试高峰论坛
    安卓学习图
    为什么mongo中不能用int作为key
    历经小半宿吧。哎,终于搭建好了Linux-C的环境
    把昨晚写的东西完善了一下,还行,真差不多
    半宿了,仿写了个CList模板类,留着以后用吧
    今天复习了一下完成端口网络模型
  • 原文地址:https://www.cnblogs.com/shaosks/p/8574838.html
Copyright © 2011-2022 走看看