zoukankan      html  css  js  c++  java
  • 302重定向、下歌

    重定向:
     
    def redirect(url):
        r = requests.get(url,params={'chrome':'utf-8', 'q':'666'})   #allow_redirects=False
        print(r.url, r.status_code, r.history)
     
    redirect('http://www.so.com/s')
    redirect('http://www.haosou.com/s')
    *********************分割线*********************
    重定向的应用场景:请求状态码为302的网址,从而获取以文件后缀结尾的真实下载网址
     
    Egの下载酷我音乐:
     
    import requests,os
     
    def downLoadKuwo(url):
        r = requests.get(url).text
        musicName = ':'.join(r.split('title>')[1].split('-酷我音乐')[0].split('-')[:2][::-1])
        musicID=url.split('?')[0].split('/')[-1]
        redirectResult=requests.get(f'http://antiserver.kuwo.cn/anti.s?format=aac|mp3
    &rid=MUSIC_{musicID}&type=convert_url&response=res')
        #urllib2获取经若干次重定向的最终网址用r.geturl();而requests库默认重定向,无需再请求r.url
        musicName=r'E:music\'+musicName+'.'+redirectResult.url.split('.')[-1]
        if not os.path.isdir('E:music'):os.makedirs('E:music')
        if not os.path.isfile(musicName):
            with open(musicName,'wb') as f: #with语法和iter_content()都是迭代器,避免内存耗尽
                for chunk in redirectResult.iter_content(1024*1024):   #f.write(redirectResult.content)
                    f.write(chunk)
     
    downLoadKuwo('http://www.kuwo.cn/yinyue/97881')   #http://bd.kuwo.cn/yinyue/7746750
    ****************************************分割线****************************************
    Egの下载α旗下某视频网的视频:
     
    import re,requests
    from fake_useragent import UserAgent
    from urllib.parse import parse_qs
     
    def 下载某网(url):
        res=requests.get(url,headers={'User-Agent':UserAgent().random}).text
        videoName=re.findall('title":"(.*?)"',res)[0]+'.mp4'
        vUrl=re.findall('url_encoded_fmt_stream_map":"(.*?)"',res)[0].replace(r'u0026','&')
        vUrl=parse_qs(vUrl)['url'][0].replace(',',',')
        result=requests.get(vUrl)
        with open(videoName,'wb') as f:
            for chunk in result.iter_content(1024 * 1024):
                f.write(chunk)
     
    url='https://www.某网.com/watch?v=5yAU52qfYuU'
    下载某网(url)
    ****************************************分割线****************************************
    Egの下载喜马拉雅FM的音乐:
     
    import requests,re,os,json
    from fake_useragent import UserAgent
     
    h={'User-Agent': UserAgent().random}
    noName=r'[\/:*?"<>|]'
     
    def getAnchors():
        anchors = []
        for x in range(1,3):    #下载两页的主播人数
            res=requests.get(f'http://www.ximalaya.com/dq/{x}/',headers=h).text
            anchors.extend(re.findall('href="(.+?)" hashlink title="(.+?)" class="discoverAlbum_title',res))
        return anchors
     
    def getAlbums():
        for anchors in getAnchors():
            path=f'E:/xmly/{anchors[1]}'
            if not os.path.isdir(path): os.makedirs(path)
            os.chdir(path)
            res = requests.get(anchors[0], headers=h).text    #各主播的音频专辑只下1页
            rule='(d+?)" track_title="(.+?)" track_.+?([0-9-]+?)<.+?title="(d+?)次'
            musicsDetails=re.findall(rule,res,re.S)
            with open(f'{anchors[1]}.txt','w',encoding='utf8') as f:
                for x in musicsDetails:
                    f.write(json.dumps(x,ensure_ascii=False)+' ')
            for x in musicsDetails[:4]: #音频挺多,每页选取前4个下载
                js=f'http://www.ximalaya.com/tracks/{x[0]}.json'
                musicUrl=requests.get(js,headers=h).json()['play_path_32']
                fileName=re.sub(noName,' ',x[1]).strip()+'.m4a'
                with open(fileName,'wb') as music:
                    music.write(requests.get(musicUrl,headers=h).content)
     
    if __name__ == '__main__':
        getAlbums()
  • 相关阅读:
    使用Mybatis-Generator自动生成Dao、Model、Mapping相关文件
    Mybatis学习 PageHelper分页插件
    mysql 5.1.7.17 zip安装 和 隔段时间服务不见了处理
    使用Maven搭建Struts2+Spring3+Hibernate4的整合开发环境
    一位资深程序员大牛给予Java初学者的学习建议
    数据结构和算法学习 -- 线性表
    多线程的实现方式区别
    Log4j.properties属性文件
    Java自定义注解
    Spring配置属性文件
  • 原文地址:https://www.cnblogs.com/scrooge/p/7787267.html
Copyright © 2011-2022 走看看