zoukankan      html  css  js  c++  java
  • 搜狗音乐爬虫下载python

    import requests
    import re
    
    session = requests.Session()
    r = session.get('http://www.kugou.com/yy/rank/home/1-8888.html?from=homepage')
    html = r.text
    pattern = r'<a href="(.+?)" data-active="playDwn" data-index="d+" class="pc_temp_songname" title="(.+?)" hidefocus="true">.+?</a>'
    m = re.findall(pattern, html)
    if m:
        for line in m:
            # print line
            mp3name = line[1]
            r = session.get(line[0])
            html = r.text
            m = re.search(r'[{"hash":"(.+?)".+"album_id":(d*)}]', html)
            if m:
                hash,album_id = m.group(1),m.group(2)
                url = 'http://www.kugou.com/yy/index.php?r=play/getdata&hash=%s&album_id=%s&_=1508983920130' % (hash, album_id)
                print(url)
                r = session.get(url)
                d = r.json()
                if d["status"] == 1:
                    mp3url = d["data"]["play_url"]
                    r = session.get(mp3url, stream=True)
                    with open(r'd:mp3\%s.mp3' % mp3name, "wb") as f:
                            for chunk in r.iter_content(chunk_size=512):
                                if chunk:
                                    f.write(chunk)
  • 相关阅读:
    java lambda
    ssh配置基础
    信息安全课程笔记1
    字体标记与文字布局
    字符串:格式化
    字符串
    标签详细描述
    HTML中的标签列表
    html(1)
    python列表命令
  • 原文地址:https://www.cnblogs.com/chif/p/9231433.html
Copyright © 2011-2022 走看看