zoukankan      html  css  js  c++  java
  • PyQuery查询html信息

    以下代码主要演示使用pyquery进行对html文件的解析,包括设定编码,对子块进行查询等操作:

    from pyquery import PyQuery as pq
    import os
    from lxml.html import HTMLParser, fromstring
    
    def getHouseInfoFromPage(page):
        houseInfo = HouseinfoItem()
        UTF8_PARSER = HTMLParser(encoding='utf-8') #此处设定pyquery使用的编码
        with open(page, encoding='utf-8') as filehandler:
            file_contents = filehandler.read()
        doc = pq(fromstring(file_contents, parser = UTF8_PARSER))
    
        # 获取联系方式div
        contactCard = doc('.right-border')
        houseInfo.houseType = contactCard('.col-right-tit div.fl').text()
        houseInfo.personName = contactCard('.person-name').text()
        houseInfo.companyName = contactCard('p.company-name').text()
        if houseInfo.personName=='':
            return
    
        houseInfo.price = doc('.basic-info-price').text()
        if isNumber(houseInfo.price):
            houseInfo.price = float(houseInfo.price)
    
        # 获取基本信息div
        basicInfo = doc('.basic-info')
        houseInfo.addr = basicInfo('li.with-area a:last').text()
        houseInfo.district = basicInfo('li.with-area a:eq(1)').text()
        huXing = basicInfo('li:contains("㎡")').text()
        houseInfo.area = huXing.split('-')[-1]
        
        houseInfo.allocation = basicInfo('.peizhi p').text()
        houseInfo.link = os.path.basename(page)
        houseInfo.summary = doc('.summary-cont').text()
    
        phoneEle = doc('.talk-btn')
        houseInfo.phone = phoneEle.attr['data-phone']
        houseInfo.houseId = houseInfo.link.split('.')[0]
  • 相关阅读:
    FastReport3.18使用心得
    FastReport问题整理
    SQL server 2005基于已存在的表创建分区
    SQL Server 2005对海量数据处理
    SQL SERVER2005加密解密数据
    Linux/Unix环境下的make和makefile详解 
    我想要的书
    全面提升BIND DNS服务器安全华江
    NOR和NAND Flash存储器的区别
    两个应届生找工作的好网站
  • 原文地址:https://www.cnblogs.com/silverbullet11/p/python_pyquery.html
Copyright © 2011-2022 走看看