zoukankan      html  css  js  c++  java
  • java_爬虫_从腾讯视频播放界面爬取视频真实地址

    由于想在微信公众号里爬一点儿考研的视频

    花了差不多一天的时间把这个爬虫做好(其实也不算爬虫吧,就算个能批量处理的地址解析器,半个爬虫)

    不多说,进正题

    (本文适合有java基础的同学,没基础的用客户端缓存然后格式转换吧)

    所需条件:

    1.一台联网的有java环境的电脑

    2.耐心

    访问后台接口网址:

    http://vv.video.qq.com/getinfo(低清的只要这一个就好了)

    http://vv.video.qq.com/getkey(高清的需要访问这个)

    原理(获取低清视频,先把原理打通,高清后期有时间会更):

    步骤一:

    获取你想要下载的视频的腾讯视频页面地址(这个很容易啦,就不赘述)

    此处以:https://v.qq.com/x/page/f08302y6rof.html为例

    步骤二:

    获取视频vid

    此处的vid为f08302y6rof,就是上边儿网址上那一串长长的东西

    步骤三:

    用获取到的视频的vid替换下面红色标明的vid(这一步是访问后台接口得到json报文)

    http://vv.video.qq.com/getinfo?vids=f08302y6rof&platform=101001&charge=0&otype=json&defn=shd

    然后访问

    步骤四:

    在页面返回浏览器的json报文中,找到fn 和 fvkey

    我这边儿传回的报文如下

    QZOutputJson={"dltype":1,"exem":0,"fl":{"cnt":2,"fi":[{"id":100701,"name":"msd","lmt":0,"sb":1,"cname":"标清;(270P)","br":29,"profile":2,"drm":0,"video":1,"audio":1,"fs":101567331,"super":0,"hdr10enh":0,"sl":1},{"id":2,"name":"mp4","lmt":0,"sb":1,"cname":"高清;(480P)","br":34,"profile":1,"drm":0,"video":1,"audio":1,"fs":130427092,"super":0,"hdr10enh":0,"sl":0}]},"hs":0,"ip":"111.79.225.65","ls":0,"preview":3383,"s":"o","sfl":{"cnt":0},"tm":1556431150,"vl":{"cnt":1,"vi":[{"br":29,"ch":0,"cl":{"fc":0,"keyid":"f08302y6rof.100701"},"ct":21600,"drm":0,"dsb":0,"fmd5":"74e3040ce70af50716abead16c9fba50","fn":"f08302y6rof.m701.mp4","fs":101567331,"fst":5,"fvkey":"D351DB69FA6EC791CB6DE47266F80B21BFFFAA3616A7B42975903ED5EA68589C0E2454137002A84799CF43B4FD972B415259C1F23C21CD34F2C34BC64F6D7D16F21BF3BF94F22B09491FC9D8C96CFFA3B3177345807F34EFDDAF94449E72FC3B8C55751EE9EADC5F","head":0,"hevc":0,"iflag":0,"level":0,"lnk":"f08302y6rof","logo":1,"mst":8,"pl":null,"share":1,"sp":0,"st":2,"tail":0,"td":"3383.47","ti":"2020考研数学寒假计划(第一次课)","tie":0,"type":3,"ul":{"ui":[{"url":"http://ugcws.video.gtimg.com/uwMROfz0r5zAoaQXGdGnC2dfhzlOR5XW60pRw41PvMP8tDlH/","vt":106,"dtc":0,"dt":2},{"url":"http://ugcydzd.qq.com/uwMROfz0r5zAoaQXGdGlC2dfhznfaJdqBNmJ_NLSRfZb0kLT/","vt":146,"dtc":0,"dt":2},{"url":"http://ugcsjy.qq.com/uwMROfz0r5zAoaQXGdGlK2dfhzmm-mdByiC0fycrmmUBpCVq/","vt":176,"dtc":0,"dt":2},{"url":"http://video.dispatch.tc.qq.com/uwMROfz0r5zAoaQXGdGlLGdfhzn3bYHHUWfJ-3lk8pLFnjzb/","vt":0,"dtc":0,"dt":2}]},"vh":480,"vid":"f08302y6rof","videotype":0,"vr":0,"vst":2,"vw":270,"wh":0.5625,"wl":{"wi":[{"id":19,"x":14,"y":14,"w":84,"h":27,"a":100,"md5":"dcc9dc5c478c4100ea2817c5e6020f26","url":"http://puui.qpic.cn/vcolumn_pic/0/logo_qing_xi_color_336_108.png/0","surl":"https://puui.qpic.cn/vcolumn_pic/0/logo_qing_xi_color_336_108.png/0"}]},"uptime":1548118095,"fvideo":0,"fvpint":0,"swhdcp":0}]}};

    (传回来的就是一行,我直接用java代码解析了,手动找费眼睛)

    步骤五:

    利用获取到的fn和fvkey构建视频下载地址

    此处构建的是:

    http://ugcws.video.gtimg.com/f08302y6rof.m701.mp4?vkey=2E657DF01414A1F95E0B3CF7F187CEB84B3E439F5D0BA2D7F052967654DEFDE53292F0BE8BCD373FA0F269BA6BE5CC1AD5CC4AEE269AB0B1C72261815608260190B1D14D9B1820B0394DAB0C8DA1D8561F3B3455FBE5BA27D618C81A0A233256DDDAB6429E3A05FF

    把获取到的fn替换前边儿一个短的标红内容

    fvkey替换后边儿长的标红内容

    这就是完整的视频下载地址了,可以用迅雷下载

    完成

    源码如下(有错误或者不规范请大佬指出,个人机器上可以运行):

    package catchVedio;
    
    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.File;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.io.UnsupportedEncodingException;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.ArrayList;
    import java.util.List;
    
    /**
     * 获取视频接口的json
     * @author Administrator
     *
     */
    public class CatchVedio {
    //    Socket client = new Scoket();
        private URL url;
        private HttpURLConnection urlConnection;
        private int responseCode;
        private BufferedReader reader;
        private BufferedWriter writer;
        
        
        public static void main(String[] args) {
            CatchVedio cv = new CatchVedio();
            try {
                
                String[] VedioURL = cv.get_VedioURL();//接收
                for(String temp:VedioURL) {//temp是每一个视频的播放地址
                    cv.toDownloadURL(cv.analyse(cv.get_Json(temp)));//写出到文件
                }    
            } catch (IOException e) {
                // TODO 自动生成的 catch 块
                e.printStackTrace();
            }finally {
                try {
                    cv.reader.close();
                    cv.writer.close();
                } catch (IOException e) {
                    // TODO 自动生成的 catch 块
                    e.printStackTrace();
                }
            }
        
        }
        
        void toDownloadURL(String real_url) throws IOException {//将对应下载地址输出到文件
            this.writer = new BufferedWriter(new FileWriter("D:/worm/downloadURL.txt",true));//定义追加方式写入的流
    //        this.writer.append(real_url);
            this.writer.write(real_url+"
    ");
            this.writer.flush();
        }
        
        String analyse(String json) {//分析json,传回完整下载地址
            int fvkey_index = json.indexOf(""fvkey":"")+9;
            int endIndex = json.indexOf(""",fvkey_index);
            String fvkey = json.substring(fvkey_index,endIndex);//获取到fvkey
    //        System.out.println(fvkey);
            
            int fn_index = json.indexOf(""fn":"")+6;
            int fn_end = json.indexOf(""",fn_index);
            String fn = json.substring(fn_index,fn_end);//获取到视频文件名 
    //        System.out.println(fn);
            
            String head = "http://ugcws.video.gtimg.com/";
            
            StringBuffer real_url = new StringBuffer();
            real_url.append(head);//加入头部
            real_url.append(fn+"?");//加入文件名
            real_url.append("vkey="+fvkey);//加入解锁码
            /*构造成功*/
    //        System.out.println(real_url.toString());
            return real_url.toString();
            
        }
        
        String get_Json(String url) throws UnsupportedEncodingException, IOException {
            String line = "";
            StringBuffer sb = new StringBuffer();
            this.url = new URL(url);
            this.urlConnection = (HttpURLConnection)this.url.openConnection();
            this.responseCode = this.urlConnection.getResponseCode();
            if (this.responseCode == 200) {
                this.reader = new BufferedReader(new InputStreamReader(this.urlConnection.getInputStream(), "UTF-8"));
                while ((line = this.reader.readLine()) != null) {
                    sb.append(line);// 网页传回的只有一行
                }
                return sb.toString();
            }
            return "";
        }
        
        String[] get_VedioURL() throws IOException {
    //    void get_VedioURL() throws IOException {
            File file = new File("D:/worm/vedioURL.txt");
            String line = "";
            this.reader = new BufferedReader(new FileReader(file));
            String[] t = new String[0];
            List<String> container = new ArrayList<String>();
            while(null!=(line = this.reader.readLine())) {
                if(line.equals("")) {
                    continue;
                }
                line = this.change(line);//转换一下
                container.add(line);//装入容器
            }
            return container.toArray(t);
        }
        /**
         * http://vv.video.qq.com/getinfo?vids=x0164ytbgov&platform=101001&charge=0&otype=json&defn=shd //格式
         * @param str
         * @return
         * https://v.qq.com/x/page/f08302y6rof.html//页面地址示例
         * https://v.qq.com/x/page/y083158hphd.html
         * https://v.qq.com/x/page/c08503oe58c.html
         */
        String change(String str) {//定义从页面播放地址获取vid转换到后台接口地址的方法
            String head = "http://vv.video.qq.com/getinfo?vids=";
            String tail = "&platform=101001&charge=0&otype=json&defn=shd";
            String vid = str.substring(str.indexOf("page/")+5,str.indexOf(".html"));
            return head+vid+tail;
        }
    }

    我是输入输出都是文件操作

    希望对大家有所帮助

    以上

  • 相关阅读:
    安卓平台下ARM Mali OpenCL编程-GPU信息检测(转)
    Android 常用的性能分析工具详解:GPU呈现模式, TraceView, Systrace, HirearchyViewer(转)
    windows配置meld
    nginx的学习材料
    nginx+lua组合的web框架
    [转] Linux下防火墙iptables用法规则详及其防火墙配置
    转:关于知乎音视频学习入门的解答
    转: 在创业公司使用C++
    【转】 消息队列设计精要
    转: __asm__ __volatile__内嵌汇编用法简述
  • 原文地址:https://www.cnblogs.com/lavender-pansy/p/10783654.html
Copyright © 2011-2022 走看看