zoukankan      html  css  js  c++  java
  • python实战之爬取喜玛拉雅专辑信息

     1 import urllib.request
     2 import json
     3 from lxml import etree
     4 
     5 url='http://www.ximalaya.com/dq/8.ajax'
     6 headers ={    
     7     "User-Agent":'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
     8 }
     9 req = urllib.request.Request(url, headers= headers)
    10 response = urllib.request.urlopen(req)
    11 jsonobj=json.loads(response.read().decode('utf-8'))
    12 html=jsonobj['html']
    13 xml= etree.HTML(html)
    14 nodeList = xml.xpath('//div[@class="discoverAlbum_item"]')
    15 for node in nodeList:
    16     img=node.xpath('.//img/@src')
    17     print(img[0],end='	')
    18     title=node.xpath('.//img/@alt')
    19     print(title[0],end='	')
    20     href = node.xpath('./a/@href')
    21     print(href[0],end='	')

    采用xpath解析html

    有追求,才有动力!

    向每一个软件工程师致敬!

    by wujf

    mail:921252375@qq.com

  • 相关阅读:
    forEach与迭代器
    JavaMap
    java stack
    Java的Iterator迭代器
    JavaScript基础知识汇总
    Http协议总结
    以太坊交易剔重规则
    localhost与127.0.0.1与0.0.0.0
    boost之asio
    调和级数求和
  • 原文地址:https://www.cnblogs.com/wujf/p/8056314.html
Copyright © 2011-2022 走看看