zoukankan      html  css  js  c++  java
  • PyQuery基本操作介绍

    PyQuery基本操作介绍

    PyQuery为Python提供一个类似于jQuery对HTML的操作方式,可以使用jQuery的语法对html文档进行查询操作。
    本文以百度首页为例来介绍PyQuery的一些基本操作。
    

    初始化pyquery

    from pyquery import PyQuery as pq
    
    doc = pq(url='http://www.baidu.com')
    print(type(doc))
    

    <class 'pyquery.pyquery.PyQuery'>

    # 获取导航链接的父元素(id='u1')
    products = doc('#u1')
    
    print(type(products))
    

    <class 'pyquery.pyquery.PyQuery'>

    link_index_first = products('a:first')
    link_index_last = products('a:last')
    link_index_custom = products('a:eq(2)')
    
    print(type(link_index_first))
    

    <class 'pyquery.pyquery.PyQuery'>

    可以通过PyQuery的text()方法来获取其对应的文字

    print(link_index_first.text())
    print(link_index_last.text())
    print(link_index_custom.text())
    

    糯米
    更多产品
    hao123

    也可以通过PyQuery的attr()方法来获取元素的属性

    print(link_index_first.attr('name'))
    

    tj_trnuomi

    下面来遍历所有导航按钮。 P.S. 注意此时link的类型是“lxml.html.HtmlElement”

    # 遍历所有导航链接,并显示链接的name属性和在网页上显示的文字
    links = products('a')
    for link in links:
        id_name = link.get('name')
        text = link.text
        print('Name: {0: <15}	Text: {1: <15}'.format(id_name, text))
    

    Name: tj_trnuomi Text: 糯米
    Name: tj_trnews Text: 新闻
    Name: tj_trhao123 Text: hao123
    Name: tj_trmap Text: 地图
    Name: tj_trvideo Text: 视频
    Name: tj_trtieba Text: 贴吧
    Name: tj_login Text: 登录
    Name: tj_settingicon Text: 设置
    Name: tj_briicon Text: 更多产品

    下面介绍一下初始化PyQuery时的另外两种参数

    • 直接转换字符串
    d = pq("<html></html>")
    d = pq(etree.fromstring("<html></html>"))
    
    • 读取文件
    d = pq(filename=path_to_html_file)
    

    另外,在处理需要编码的文件时可以使用如下的方法:

    from lxml.html import HTMLParser, fromstring
    UTF8_PARSER = HTMLParser(encoding='utf-8')
    with open(page, encoding='utf-8') as filehandler:
        file_contents = filehandler.read()
    doc = pq(fromstring(file_contents, parser = UTF8_PARSER))
    
  • 相关阅读:
    LeetCode "Palindrome Partition II"
    LeetCode "Longest Substring Without Repeating Characters"
    LeetCode "Wildcard Matching"
    LeetCode "Best Time to Buy and Sell Stock II"
    LeetCodeEPI "Best Time to Buy and Sell Stock"
    LeetCode "Substring with Concatenation of All Words"
    LeetCode "Word Break II"
    LeetCode "Word Break"
    Some thoughts..
    LeetCode "Longest Valid Parentheses"
  • 原文地址:https://www.cnblogs.com/silverbullet11/p/PyQuery.html
Copyright © 2011-2022 走看看