zoukankan      html  css  js  c++  java
  • js反向解析爬取企**网站

    1.反向解析案例一

    • 工具
    Nodejs、pycharm
    
    • 目标网站
    https://www.qimingpian.com/finosda/project/pinvestment
    
    • 爬取内容

    • F12点开开发工具,刷新页面。在XHR,Doc就有3个文件:
    pinvestment、productListVip、industryFieldVip
    

    • 看pinvestment的Resonse内容发现一大堆JS,没有网页信息。

    • 在productListVip和industryFieldVip的响应内容,都会有一个”encrypt_data”的参数

    • 参数内容类一串Base64字符,即然网站对这个参数做了加密,说明它不想被爬取,所以可以假设目标数据“encrypt_data”。

    • 在开发者攻击里Sources选项卡中,找到网页JS文件夹,结面右侧为断点调试栏。

    • 在JS文件里打断点,然后一步步调试,在断点调试栏里有个XHR/fetch Breakpoints,它支持在发送XHR请求的位置打上断点,我们找到的两个含加密参数的请求就是XHR类型的,正好用上这个功能。点击+号输入请求名称即可:

    • 刷新页面,然后一步一步执行,发现可疑信息把鼠标放上去看:

    • 调试技巧:

      1压缩的js点击左下角的花括号来美化
      2在调试过程中使用Console执行js代码。比如我觉得这个函数很可疑,想执行一下看看。
      
    • 选中Object(u,a)(e.encrypt_data)右键点击Evaluate selected text in Console,就会在调试器打印:

    • 咦,这不就是我们想要的结果吗?

    • function o(t)就是我们需要的解密函数,可以看到它先调用s函数,传入了四个参数,除了a.a.decode(t)外其他三个都是写死的,最后用JSON.parse转为json对象。那么这个a.a.decode(t)又是什么鬼?

    • 跳转到另外一个执行JS文件的函数。这样新建一个JS文件,把涉及function o 的代码全部抠出来。
    //代码第一段:
    function o(t) {
                return JSON.parse(s("5e5062e82f15fe4ca9d24bc5", a.a.decode(t), 0, 0, "012345677890123", 1))
            }
    //改写代码
    function o(t) {
                return new Buffer(s("5e5062e82f15fe4ca9d24bc5", my_decode(t), 0, 0, "012345677890123", 1)).toString("base64")
            }
    //base64编码  Buffer为Notejs方法  my_decode为第二端代码新命名,因为第二段代码为匿名函数需要重新命名
    //····················································
    //代码第二段:
    decode: function(t) {
                            var e = (t = String(t).replace(f, "")).length;
                            e % 4 == 0 && (e = (t = t.replace(/==?$/, "")).length),
                            (e % 4 == 1 || /[^+a-zA-Z0-9/]/.test(t)) && l("Invalid character: the string to be decoded is not correctly encoded.");
                            for (var n, r, i = 0, o = "", a = -1; ++a < e; )
                                r = c.indexOf(t.charAt(a)),
                                n = i % 4 ? 64 * n + r : r,
                                i++ % 4 && (o += String.fromCharCode(255 & n >> (-2 * i & 6)));
                            return o
                        }
    //改写代码
    function my_decode(t) {
                            c="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
                            f = /[	
    f
     ]/g
                            var e = (t = String(t).replace(f, "")).length;
                            e % 4 == 0 && (e = (t = t.replace(/==?$/, "")).length),
                            (e % 4 == 1 || /[^+a-zA-Z0-9/]/.test(t)) && l("Invalid character: the string to be decoded is not correctly encoded.");
                            for (var n, r, i = 0, o = "", a = -1; ++a < e; )
                                r = c.indexOf(t.charAt(a)),
                                n = i % 4 ? 64 * n + r : r,
                                i++ % 4 && (o += String.fromCharCode(255 & n >> (-2 * i & 6)));
                            return o
                        }
    //c,f为固定值,可在前端选中查看见下图
    
    //····················································不需要更改
    function s(t, e, i, n, a, s) {
                var o, r, c, l, u, d, h, p, f, v, m, g, b, y, _ = new Array(16843776,0,65536,16843780,16842756,66564,4,65536,1024,16843776,16843780,1024,16778244,16842756,16777216,4,1028,16778240,16778240,66560,66560,16842752,16842752,16778244,65540,16777220,16777220,65540,0,1028,66564,16777216,65536,16843780,4,16842752,16843776,16777216,16777216,1024,16842756,65536,66560,16777220,1024,4,16778244,66564,16843780,65540,16842752,16778244,16777220,1028,66564,16843776,1028,16778240,16778240,0,65540,66560,0,16842756), C = new Array(-2146402272,-2147450880,32768,1081376,1048576,32,-2146435040,-2147450848,-2147483616,-2146402272,-2146402304,-2147483648,-2147450880,1048576,32,-2146435040,1081344,1048608,-2147450848,0,-2147483648,32768,1081376,-2146435072,1048608,-2147483616,0,1081344,32800,-2146402304,-2146435072,32800,0,1081376,-2146435040,1048576,-2147450848,-2146435072,-2146402304,32768,-2146435072,-2147450880,32,-2146402272,1081376,32,32768,-2147483648,32800,-2146402304,1048576,-2147483616,1048608,-2147450848,-2147483616,1048608,1081344,0,-2147450880,32800,-2147483648,-2146435040,-2146402272,1081344), w = new Array(520,134349312,0,134348808,134218240,0,131592,134218240,131080,134217736,134217736,131072,134349320,131080,134348800,520,134217728,8,134349312,512,131584,134348800,134348808,131592,134218248,131584,131072,134218248,8,134349320,512,134217728,134349312,134217728,131080,520,131072,134349312,134218240,0,512,131080,134349320,134218240,134217736,512,0,134348808,134218248,131072,134217728,134349320,8,131592,131584,134217736,134348800,134218248,520,134348800,131592,8,134348808,131584), x = new Array(8396801,8321,8321,128,8396928,8388737,8388609,8193,0,8396800,8396800,8396929,129,0,8388736,8388609,1,8192,8388608,8396801,128,8388608,8193,8320,8388737,1,8320,8388736,8192,8396928,8396929,129,8388736,8388609,8396800,8396929,129,0,0,8396800,8320,8388736,8388737,1,8396801,8321,8321,128,8396929,129,1,8192,8388609,8193,8396928,8388737,8193,8320,8388608,8396801,128,8388608,8192,8396928), k = new Array(256,34078976,34078720,1107296512,524288,256,1073741824,34078720,1074266368,524288,33554688,1074266368,1107296512,1107820544,524544,1073741824,33554432,1074266112,1074266112,0,1073742080,1107820800,1107820800,33554688,1107820544,1073742080,0,1107296256,34078976,33554432,1107296256,524544,524288,1107296512,256,33554432,1073741824,34078720,1107296512,1074266368,33554688,1073741824,1107820544,34078976,1074266368,256,33554432,1107820544,1107820800,524544,1107296256,1107820800,34078720,0,1074266112,1107296256,524544,33554688,1073742080,524288,0,1074266112,34078976,1073742080), T = new Array(536870928,541065216,16384,541081616,541065216,16,541081616,4194304,536887296,4210704,4194304,536870928,4194320,536887296,536870912,16400,0,4194320,536887312,16384,4210688,536887312,16,541065232,541065232,0,4210704,541081600,16400,4210688,541081600,536870912,536887296,16,541065232,4210688,541081616,4194304,16400,536870928,4194304,536887296,536870912,16400,536870928,541081616,4210688,541065216,4210704,541081600,0,541065232,16,16384,541065216,4210704,16384,4194320,536887312,0,541081600,536870912,4194320,536887312), A = new Array(2097152,69206018,67110914,0,2048,67110914,2099202,69208064,69208066,2097152,0,67108866,2,67108864,69206018,2050,67110912,2099202,2097154,67110912,67108866,69206016,69208064,2097154,69206016,2048,2050,69208066,2099200,2,67108864,2099200,67108864,2099200,2097152,67110914,67110914,69206018,69206018,2,2097154,67108864,67110912,2097152,69208064,2050,2099202,69208064,2050,67108866,69208066,69206016,2099200,0,2,69208066,0,2099202,69206016,2048,67108866,67110912,2048,2097154), L = new Array(268439616,4096,262144,268701760,268435456,268439616,64,268435456,262208,268697600,268701760,266240,268701696,266304,4096,64,268697600,268435520,268439552,4160,266240,262208,268697664,268701696,4160,0,0,268697664,268435520,268439552,266304,262144,266304,262144,268701696,4096,64,268697664,4096,266304,268439552,64,268435520,268697600,268697664,268435456,262144,268439616,0,268701760,262208,268435520,268697600,268439552,268439616,0,268701760,266240,266240,4160,4160,262208,268435456,268701696), S = function(t) {
                    for (var e, i, n, a = new Array(0,4,536870912,536870916,65536,65540,536936448,536936452,512,516,536871424,536871428,66048,66052,536936960,536936964), s = new Array(0,1,1048576,1048577,67108864,67108865,68157440,68157441,256,257,1048832,1048833,67109120,67109121,68157696,68157697), o = new Array(0,8,2048,2056,16777216,16777224,16779264,16779272,0,8,2048,2056,16777216,16777224,16779264,16779272), r = new Array(0,2097152,134217728,136314880,8192,2105344,134225920,136323072,131072,2228224,134348800,136445952,139264,2236416,134356992,136454144), c = new Array(0,262144,16,262160,0,262144,16,262160,4096,266240,4112,266256,4096,266240,4112,266256), l = new Array(0,1024,32,1056,0,1024,32,1056,33554432,33555456,33554464,33555488,33554432,33555456,33554464,33555488), u = new Array(0,268435456,524288,268959744,2,268435458,524290,268959746,0,268435456,524288,268959744,2,268435458,524290,268959746), d = new Array(0,65536,2048,67584,536870912,536936448,536872960,536938496,131072,196608,133120,198656,537001984,537067520,537004032,537069568), h = new Array(0,262144,0,262144,2,262146,2,262146,33554432,33816576,33554432,33816576,33554434,33816578,33554434,33816578), p = new Array(0,268435456,8,268435464,0,268435456,8,268435464,1024,268436480,1032,268436488,1024,268436480,1032,268436488), f = new Array(0,32,0,32,1048576,1048608,1048576,1048608,8192,8224,8192,8224,1056768,1056800,1056768,1056800), v = new Array(0,16777216,512,16777728,2097152,18874368,2097664,18874880,67108864,83886080,67109376,83886592,69206016,85983232,69206528,85983744), m = new Array(0,4096,134217728,134221824,524288,528384,134742016,134746112,16,4112,134217744,134221840,524304,528400,134742032,134746128), g = new Array(0,4,256,260,0,4,256,260,1,5,257,261,1,5,257,261), b = t.length > 8 ? 3 : 1, y = new Array(32 * b), _ = new Array(0,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0), C = 0, w = 0, x = 0; x < b; x++) {
                        var k = t.charCodeAt(C++) << 24 | t.charCodeAt(C++) << 16 | t.charCodeAt(C++) << 8 | t.charCodeAt(C++)
                          , T = t.charCodeAt(C++) << 24 | t.charCodeAt(C++) << 16 | t.charCodeAt(C++) << 8 | t.charCodeAt(C++);
                        k ^= (n = 252645135 & (k >>> 4 ^ T)) << 4,
                        k ^= n = 65535 & ((T ^= n) >>> -16 ^ k),
                        k ^= (n = 858993459 & (k >>> 2 ^ (T ^= n << -16))) << 2,
                        k ^= n = 65535 & ((T ^= n) >>> -16 ^ k),
                        k ^= (n = 1431655765 & (k >>> 1 ^ (T ^= n << -16))) << 1,
                        k ^= n = 16711935 & ((T ^= n) >>> 8 ^ k),
                        n = (k ^= (n = 1431655765 & (k >>> 1 ^ (T ^= n << 8))) << 1) << 8 | (T ^= n) >>> 20 & 240,
                        k = T << 24 | T << 8 & 16711680 | T >>> 8 & 65280 | T >>> 24 & 240,
                        T = n;
                        for (var A = 0; A < _.length; A++)
                            _[A] ? (k = k << 2 | k >>> 26,
                            T = T << 2 | T >>> 26) : (k = k << 1 | k >>> 27,
                            T = T << 1 | T >>> 27),
                            T &= -15,
                            e = a[(k &= -15) >>> 28] | s[k >>> 24 & 15] | o[k >>> 20 & 15] | r[k >>> 16 & 15] | c[k >>> 12 & 15] | l[k >>> 8 & 15] | u[k >>> 4 & 15],
                            i = d[T >>> 28] | h[T >>> 24 & 15] | p[T >>> 20 & 15] | f[T >>> 16 & 15] | v[T >>> 12 & 15] | m[T >>> 8 & 15] | g[T >>> 4 & 15],
                            n = 65535 & (i >>> 16 ^ e),
                            y[w++] = e ^ n,
                            y[w++] = i ^ n << 16
                    }
                    return y
                }(t), z = 0, B = e.length, I = 0, j = 32 == S.length ? 3 : 9;
                p = 3 == j ? i ? new Array(0,32,2) : new Array(30,-2,-2) : i ? new Array(0,32,2,62,30,-2,64,96,2) : new Array(94,62,-2,32,64,2,30,-2,-2),
                2 == s ? e += "        " : 1 == s ? i && (c = 8 - B % 8,
                e += String.fromCharCode(c, c, c, c, c, c, c, c),
                8 === c && (B += 8)) : s || (e += "");
                var F = ""
                  , $ = "";
                for (1 == n && (f = a.charCodeAt(z++) << 24 | a.charCodeAt(z++) << 16 | a.charCodeAt(z++) << 8 | a.charCodeAt(z++),
                m = a.charCodeAt(z++) << 24 | a.charCodeAt(z++) << 16 | a.charCodeAt(z++) << 8 | a.charCodeAt(z++),
                z = 0); z < B; ) {
                    for (d = e.charCodeAt(z++) << 24 | e.charCodeAt(z++) << 16 | e.charCodeAt(z++) << 8 | e.charCodeAt(z++),
                    h = e.charCodeAt(z++) << 24 | e.charCodeAt(z++) << 16 | e.charCodeAt(z++) << 8 | e.charCodeAt(z++),
                    1 == n && (i ? (d ^= f,
                    h ^= m) : (v = f,
                    g = m,
                    f = d,
                    m = h)),
                    d ^= (c = 252645135 & (d >>> 4 ^ h)) << 4,
                    d ^= (c = 65535 & (d >>> 16 ^ (h ^= c))) << 16,
                    d ^= c = 858993459 & ((h ^= c) >>> 2 ^ d),
                    d ^= c = 16711935 & ((h ^= c << 2) >>> 8 ^ d),
                    d = (d ^= (c = 1431655765 & (d >>> 1 ^ (h ^= c << 8))) << 1) << 1 | d >>> 31,
                    h = (h ^= c) << 1 | h >>> 31,
                    r = 0; r < j; r += 3) {
                        for (b = p[r + 1],
                        y = p[r + 2],
                        o = p[r]; o != b; o += y)
                            l = h ^ S[o],
                            u = (h >>> 4 | h << 28) ^ S[o + 1],
                            c = d,
                            d = h,
                            h = c ^ (C[l >>> 24 & 63] | x[l >>> 16 & 63] | T[l >>> 8 & 63] | L[63 & l] | _[u >>> 24 & 63] | w[u >>> 16 & 63] | k[u >>> 8 & 63] | A[63 & u]);
                        c = d,
                        d = h,
                        h = c
                    }
                    h = h >>> 1 | h << 31,
                    h ^= c = 1431655765 & ((d = d >>> 1 | d << 31) >>> 1 ^ h),
                    h ^= (c = 16711935 & (h >>> 8 ^ (d ^= c << 1))) << 8,
                    h ^= (c = 858993459 & (h >>> 2 ^ (d ^= c))) << 2,
                    h ^= c = 65535 & ((d ^= c) >>> 16 ^ h),
                    h ^= c = 252645135 & ((d ^= c << 16) >>> 4 ^ h),
                    d ^= c << 4,
                    1 == n && (i ? (f = d,
                    m = h) : (d ^= v,
                    h ^= g)),
                    $ += String.fromCharCode(d >>> 24, d >>> 16 & 255, d >>> 8 & 255, 255 & d, h >>> 24, h >>> 16 & 255, h >>> 8 & 255, 255 & h),
                    512 == (I += 8) && (F += $,
                    $ = "",
                    I = 0)
                }
                if (F = (F += $).replace(/*$/g, ""),
                !i) {
                    if (1 === s) {
                        var N = 0;
                        (B = F.length) && (N = F.charCodeAt(B - 1)),
                        N <= 8 && (F = F.substring(0, B - N))
                    }
                    F = decodeURIComponent(escape(F))
                }
                return F
            }
    //·················································
    //定义新函数:encrypt_data为https://vipapi.qimingpian.com/DataList/productListVip  返回的encrypt_data值。
    function res() {
        encrypt_data = ""
        //执行o,返回值为要爬取结果
        decrypt_data = o(encrypt_data);
    	
        return decrypt_data
    }
    
    
    • c的产看结果:

    • f查看的结果:

    • 编写py代码:

      import execjs
      import base64
      import json
      
      def decrypt(encrypt_data):
          #读取刚才编写JS文件
          ctx = execjs.compile(open('test2.js').read())
          #res返回最终结果为要的数据
          return base64.b64decode(ctx.call('res',encrypt_data))
      
      
      if __name__ == '__main__':
          encrypt_data = ''
          decrypt_data = decrypt(encrypt_data)
      
          json_data = json.loads(decrypt_data)
          print(json_data)
      
      • 最终结果:

      • 确实是我们需要的数据没错,最后用Python去调用解密函数就行了。调用时还有个需要注意的地方,因为直接返回object给Python会报错,所以这里将JSON.parse移除了,返回parse前的json字符串

      //解密函数
      function my_decrypt(t) {
          return s("5e5062e82f15fe4ca9d24bc5", my_decode(t), 0, 0, "012345677890123", 1)
      }
      
      • 同时为了防止这串字符串内有特殊编码的字符,这里将它转成base64再return:
      function my_decrypt(t) {
          return new Buffer(s("5e5062e82f15fe4ca9d24bc5", my_decode(t), 0, 0, "012345677890123", 1)).toString("base64")
      }
      
      • 然后在Python中用base64库的b64decode方法来解码即可。

    Node.js Buffer(缓冲区)

    JavaScript 语言自身只有字符串数据类型,没有二进制数据类型。
    
    但在处理像TCP流或文件流时,必须使用到二进制数据。因此在 Node.js中,定义了一个 Buffer 类,该类用来创建一个专门存放二进制数据的缓存区。
    
     Node.js 中,Buffer 类是随 Node 内核一起发布的核心库。Buffer 库为 Node.js 带来了一种存储原始数据的方法,可以让 Node.js 处理二进制数据,每当需要在 Node.js 中处理I/O操作中移动的数据时,就有可能使用 Buffer 库。原始数据存储在 Buffer 类的实例中。一个 Buffer 类似于一个整数数组,但它对应于 V8 堆内存之外的一块原始内存。
    
    • Buffer与字符编码
    Buffer 实例一般用于表示编码字符的序列,比如 UTF-8 、 UCS2 、 Base64 、或十六进制编码的数据。 通过使用显式的字符编码,就可以在 Buffer 实例与普通的 JavaScript 字符串之间进行相互转换。
    
    • 示例:

      > const buf = Buffer.from('xujunkai','ascii')
      undefined
      > buf.toString("hex")
      '78756a756e6b6169'
      > buf.toString("base64")
      'eHVqdW5rYWk='
      >    
      
  • 相关阅读:
    C/C++数组名与指针区别深入探索(转)
    mysql 的编译安装
    rpm的问题 ~/.rpmmacros %_rpmlock_path
    GCC中的弱符号与强符号(转)
    关于printf系列函数
    如何修改机器名
    multiple definition of XXXX 的解决
    由无名对象(临时对象)引发的关于“引用”的思考
    关于date中时间字符串的格式
    月薪不同,面试题不同!
  • 原文地址:https://www.cnblogs.com/xujunkai/p/12319017.html
Copyright © 2011-2022 走看看