zoukankan      html  css  js  c++  java
  • pyquery学习笔记

    很早就听说了pyquery的强大。写了个简单的测试程序实验下。

    思路是找个动态网页,先用PhantomJS加载,然后用PYQUERY解析。

    1、随便找了个带表格的股票网页,里面有大量的股票数据,测试的目的就是抓取表格中的数据。

    链接如下 

    http://quote.eastmoney.com/center/BKList.html#notion_0_0?sortRule=0

    2、使用PhantomJS加载。

    all_url = "http://quote.eastmoney.com/center/BKList.html#notion_0_0?sortRule=0"
    a_driver = webdriver.PhantomJS()
    a_driver.get(all_url)

    3、使用pyquery分析。之前看有的博主建议在传入pq前先用lxml的etree规整一遍,测试了发现会报错。不如不用。

    Traceback (most recent call last):
    File "C:/Users/Administrator/PycharmProjects/p3test/pyquery_test.py", line 25, in <module>
    doc = pq(etree.fromstring(a_driver.page_source))
    File "lxml.etree.pyx", line 3213, in lxml.etree.fromstring (srclxmllxml.etree.c:77616)
    File "parser.pxi", line 1819, in lxml.etree._parseMemoryDocument (srclxmllxml.etree.c:116413)
    File "parser.pxi", line 1700, in lxml.etree._parseDoc (srclxmllxml.etree.c:114959)
    File "parser.pxi", line 1040, in lxml.etree._BaseParser._parseUnicodeDoc (srclxmllxml.etree.c:109084)
    File "parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (srclxmllxml.etree.c:103323)
    File "parser.pxi", line 683, in lxml.etree._handleParseResult (srclxmllxml.etree.c:104977)
    File "parser.pxi", line 613, in lxml.etree._raiseParseError (srclxmllxml.etree.c:103886)
    lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: link line 11 and head, line 17, column 4656

    详细代码如下。

     1 all_url = "http://quote.eastmoney.com/center/BKList.html#notion_0_0?sortRule=0"
     2 
     3 a_driver = webdriver.PhantomJS()
     4 a_driver.get(all_url)
     5 #doc = pq(etree.fromstring(a_driver.page_source))
     6 doc = pq(a_driver.page_source)
     7 print("doc=",doc("tr"))
     8 
     9 for li in doc("tr").items():
    10     if len(li("td"))>=12:
    11         print("{0}-{1}-{2}-{3}-{4}-{5}".format(li("td").eq(0).text(),
    12                                                     li("td").eq(1).text(),
    13                                                     li("td").eq(2).text(),
    14                                                     li("td").eq(3).text(),
    15                                                     li("td").eq(4).text(),
    16                                                     li("td").eq(5).text(),))

    输出结果如下:

     1 1-共享经济-行情 股吧 资金流-1164.83-47.98-4.30%
     2 2-PM2.5-行情 股吧 资金流-2804.73-81.69-3.00%
     3 3-钛白粉-行情 股吧 资金流-1139.54-32.39-2.93%
     4 4-美丽中国-行情 股吧 资金流-2991.04-57.94-1.98%
     5 5-京津冀-行情 股吧 资金流-2866.82-48.62-1.73%
     6 6-北京冬奥-行情 股吧 资金流-1168.58-18.85-1.64%
     7 7-OLED-行情 股吧 资金流-1394.67-22.55-1.64%
     8 8-民营医院-行情 股吧 资金流-1670.81-24.72-1.50%
     9 9-虚拟现实-行情 股吧 资金流-951.37-12.92-1.38%
    10 10-次新股-行情 股吧 资金流-38011.00-499.03-1.33%
    11 11-节能环保-行情 股吧 资金流-13137.08-151.37-1.17%
    12 12-3D玻璃-行情 股吧 资金流-1067.91-11.35-1.07%
    13 13-病毒防治-行情 股吧 资金流-1915.40-19.41-1.02%
    14 14-蓝宝石-行情 股吧 资金流-1981.26-19.67-1.00%
    15 15-体育产业-行情 股吧 资金流-1978.79-19.61-1.00%
    16 16-二胎概念-行情 股吧 资金流-1799.18-17.44-0.98%
    17 17-食品安全-行情 股吧 资金流-3180.84-30.23-0.96%
    18 18-海绵城市-行情 股吧 资金流-1155.37-10.92-0.95%
    19 19-ST概念-行情 股吧 资金流-26773.39-227.49-0.86%
    20 20-创业成份-行情 股吧 资金流-2104.93-17.20-0.82%
    21 21-合同能源-行情 股吧 资金流-1545.17-12.25-0.80%
    22 22-网红直播-行情 股吧 资金流-855.33-6.28-0.74%
    23 23-AB股-行情 股吧 资金流-10146.54-74.23-0.74%
    24 24-雄安新区-行情 股吧 资金流-1425.79-9.86-0.70%
    25 25-车联网-行情 股吧 资金流-1014.41-7.06-0.70%
    26 26-滨海新区-行情 股吧 资金流-11399.25-72.70-0.64%
    27 27-国产芯片-行情 股吧 资金流-992.71-6.22-0.63%
    28 28-王亚伟系-行情 股吧 资金流-2568.52-14.95-0.59%
    29 29-石墨烯-行情 股吧 资金流-2490.11-13.95-0.56%
    30 30-燃料电池-行情 股吧 资金流-2121.03-11.85-0.56%
    31 31-金融机具-行情 股吧 资金流-1768.72-9.12-0.52%
    32 32-人工智能-行情 股吧 资金流-970.58-4.95-0.51%
    33 33-独家药品-行情 股吧 资金流-1684.69-8.34-0.50%
    34 34-免疫治疗-行情 股吧 资金流-1309.56-6.32-0.49%
    
    
    不对之处欢迎指正
  • 相关阅读:
    Jquery清除style样式
    合并单元格式
    SQL根据下标,返回split分割后字符串
    js功能比较全面的yyyyMMdd格式的日期验证正则
    DataTable to Json
    List<T>下的Find,FindAll等条件过滤函数的使用方法
    获取iframe内部DOM对象
    PowerDesigner取消Name与Code同步
    再次回归
    最近遇到一个比较有意思的题目
  • 原文地址:https://www.cnblogs.com/qggg/p/6702606.html
Copyright © 2011-2022 走看看