zoukankan      html  css  js  c++  java
  • 阿里试用,女朋友逼着我给她排序

    阿里试用排序

    抱歉,之前莫名其妙把配置文件给 ignore 了,已经修复,抱歉

    前景提要

    说来简直丢尽了钢铁直男的脸,没错,昨晚我在愉快的做着外包的活(中国移动的小程序,自由职业,喂),11点多了,女友突然脑子一抽:“你能不能帮我把这个玩意排序一下给我用啊,我好薅点羊毛,技术能实现嘛?”
    我比较无奈的看了看,阿里试用咩?什么鬼,哦哦哦,就这玩意啊,爬虫爬一下就是了。我是前端……
    回道:“没问题啊,爬虫呗。”
    她:“哇,多久能做出来啊?”
    我:“我现在在忙诶,1-2小时吧。”
    她:“行了,你别忙了,赶紧帮我弄一下出来!”
    我看了看她的脸,羞耻的最小化《微信开发者工具》。。。

    页面展示

    阿里试用

    你要是觉得这也是广告,那真是太抬举我了。

    爬虫搞起来

    NodeJS 爬虫,百度一下,到处都是现成的代码,我也就不一一分析了,拿出简书的一段代码,来自 埃米莉Emily:

    const express = require('express');
    // 调用 express 实例,它是一个函数,不带参数调用时,会返回一个 express 实例,将这个变量赋予 app 变量。
    const superagent = require('superagent');
    const cheerio = require('cheerio');
    const app = express();
    
    app.get('/', (req, res, next) => {
      console.log(req)
      superagent.get('https://www.v2ex.com/')
        .end((err, sres) => {
          // 常规的错误处理
          if (err) {
            return next(err);
          }
          // sres.text 里面存储着网页的 html 内容,将它传给 cheerio.load 之后
          // 就可以得到一个实现了 jquery 接口的变量,我们习惯性地将它命名为 `$`
          // 剩下就都是 jquery 的内容了
          let $ = cheerio.load(sres.text);
          let items = [];
          $('.item_title a').each((idx, element) => {
            let $element = $(element);
            items.push({
              title: $element.text(),
              href: $element.attr('href')
            });
          });
    
          res.send(items);
        });
    });
    
    app.listen(3000, function () {
      console.log('app is listening at port 3000');
    });
    

    嘛,express 用 NodeJS 的不可能不知道,superagent 理解成可以在 Node 里面做对外请求即可,cheerio 嗯,Node 专用 JQ。

    首爬

    把上面的请求地址换成:https://try.taobao.com/,查看页面标签结构,找到想要的选择器结构:

    标签结构

    .tb-try-wd-item-info > .detail,把这个替换上面选择器 .item_title a,走起:

    ……我不想展示结果,因为只有六个,页面实际展示是 10 个,找了半天,发现两个问题:

    推荐

    POST 请求来的数据

    如上,第一个是爬到的 6 个是推荐,喵的,不是下面列表;
    第二个,下面列表是后面通过 POST 单独请求来的数据,怎么看都是某框架的 SSR 干的好事。

    于是爬虫不成,得换战略。

    模拟 POST

    OK,既然是 POST,就好弄了,直接把连接跟参数刨出来,然后 superagent 模拟:

    superagent
      .post(
        `https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch`
      )
      .set('content-type', 'application/x-www-form-urlencoded; charset=UTF-8')
      .end((err, sres) => {
        // 常规的错误处理
        if (err) {
          return next(err)
        }
        const result = JSON.parse(sres.text).result // 返回结构树
        resolve(result)
      })   
    

    content-type 源自:

    contetnType

    哼哼哼,你没猜错,失败了,如下:

    失败页面

    想想是必然的,怎么可能给你随便请求呢,然后该怎么做?研究?nonono,老夫上来就是一梭子,不就是 Content-Type 么!

    superagent
      .post(
        `https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch`
      )
      .set(
        'user-agent',
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
      )
      .set('accept', 'pplication/json, text/javascript, */*; q=0.01')
      .set('accept-encoding', 'gzip, deflate, br')
      .set(
        'accept-language',
        'zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7,zh-TW;q=0.6,da;q=0.5'
      )
      // .set('content-length', '8')
      .set('content-type', 'application/x-www-form-urlencoded; charset=UTF-8')
      .set(
        'cookie',
        'your cookie'
      )
      .set('origin', 'https://try.taobao.com')
      .set('referer', 'https://try.taobao.com')
      .set('x-csrf-token', 'f0b8e7443eb7e')
      .set('x-requested-with', 'XMLHttpRequest')
      .end((err, sres) => {
        // 常规的错误处理
        if (err) {
          return next(err)
        }
        const result = JSON.parse(sres.text).result
        resolve(result)
      })
    

    依据就是下面这个:

    content-type2

    不就是头么,不就是源么,不就是用户代理么,用个 HTTPS 还没有你办法了?

    注意上面 .set('content-length', '8'),不知道那边怎么玩,加上这个就超时……

    于是,交代了吧:

    {
        "pages": {
            "paging": {
                "n": 2182,
                "page": 1,
                "pages": 219
            },
            "items": [
                {
                    "shopUserId": "2450112357",
                    "title": "凯度高端款嵌入式蒸烤箱",
                    "status": 1,
                    "totalNum": 1,
                    "requestNum": 15530,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "casdon凯度旗舰店",
                    "showId": "2561626",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34530215",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1ycS2eMDqK1RjSZSyXXaxEVXa.jpg",
                    "shopItemId": "559771706359",
                    "price": 13850
                },
                {
                    "shopUserId": "3189770892",
                    "title": "皇家美素佳儿老包装2段400g",
                    "status": 1,
                    "totalNum": 50,
                    "requestNum": 2079,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "皇家美素佳儿旗舰店",
                    "showId": "2551240",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34396042",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1YrSZaVYqK1RjSZLeXXbXppXa.jpg",
                    "shopItemId": "547114874458",
                    "price": 189
                },
                {
                    "shopUserId": "1077716829",
                    "title": "关注店铺优先审水密码幻彩隔离",
                    "status": 1,
                    "totalNum": 10,
                    "requestNum": 6907,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "水密码旗舰店",
                    "showId": "2568391",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34784086",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB16_4ChmzqK1RjSZPxXXc4tVXa.jpg",
                    "shopItemId": "559005882880",
                    "price": 599
                },
                {
                    "shopUserId": "725786863",
                    "title": "精品皮草派克大衣",
                    "status": 1,
                    "totalNum": 1,
                    "requestNum": 11793,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "美瑞蓓特",
                    "showId": "2557886",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34574078",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1zVLMdCrqK1RjSZK9XXXyypXa.jpg",
                    "shopItemId": "577418950477",
                    "price": 5980
                },
                {
                    "shopUserId": "3000840351",
                    "title": "保友智能新品Pofit电脑椅",
                    "status": 1,
                    "totalNum": 1,
                    "requestNum": 12895,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "保友办公家具旗舰店",
                    "showId": "2557100",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34528042",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1bYZEg6TpK1RjSZKPXXa3UpXa.png",
                    "shopItemId": "577598687971",
                    "price": 5408
                },
                {
                    "shopUserId": "791732485",
                    "title": "TEK手持吸尘器A8",
                    "status": 1,
                    "totalNum": 1,
                    "requestNum": 17195,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "泰怡凯旗舰店",
                    "showId": "2552265",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34444014",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1D6bWbhTpK1RjSZFGXXcHqFXa.jpg",
                    "shopItemId": "547653053965",
                    "price": 5199
                },
                {
                    "shopUserId": "3229583972",
                    "title": "椰富海南冷炸椰子油食用油1L",
                    "status": 1,
                    "totalNum": 20,
                    "requestNum": 4451,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "椰富食品专营店",
                    "showId": "2561698",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34532250",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1VjLSePDpK1RjSZFrXXa78VXa.jpg",
                    "shopItemId": "578653506446",
                    "price": 256
                },
                {
                    "shopUserId": "855223948",
                    "title": "卡西欧立式家用电钢琴PX770",
                    "status": 1,
                    "totalNum": 1,
                    "requestNum": 16762,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "世纪音缘乐器专营店",
                    "showId": "2551326",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34420041",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1CC6aa9zqK1RjSZFpXXakSXXa.jpg",
                    "shopItemId": "562405126383",
                    "price": 4838
                },
                {
                    "shopUserId": "4065939832",
                    "title": "关注宝贝送轻奢沙发床",
                    "status": 1,
                    "totalNum": 1,
                    "requestNum": 17436,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "贝兮旗舰店",
                    "showId": "2559904",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34532170",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1AzxYegHqK1RjSZFPXXcwapXa.jpg",
                    "shopItemId": "577798067313",
                    "price": 4399
                },
                {
                    "shopUserId": "807974445",
                    "title": "森海塞尔CX6蓝牙耳机",
                    "status": 1,
                    "totalNum": 4,
                    "requestNum": 22557,
                    "acceptNum": 0,
                    "reportNum": 0,
                    "isApplied": false,
                    "shopName": "sennheiser旗舰店",
                    "showId": "2559701",
                    "startTime": 1539619200000,
                    "endTime": 1540220400000,
                    "id": "34532161",
                    "type": 1,
                    "pic": "//img.alicdn.com/bao/uploaded/TB1HET6d7voK1RjSZFwXXciCFXa.jpg",
                    "shopItemId": "564408956766",
                    "price": 999
                }
            ]
        }
    }
    

    细心的小伙伴应该看到,我没有发送 form 给他,一样可以请求到需要的数据,page 挂在了 query 上……

    展示部分

    数据拿到,就简单了,其实就是这一个接口实现剩下的功能了,没错,记住我是前端。

    <!DOCTYPE html>
    <html lang="en">
    
    <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      <meta http-equiv="X-UA-Compatible" content="ie=edge">
      <title>tb try</title>
      <style>
        .warning {
          color: red;
        }
    
        button {
           100px;
          height: 44px;
          margin-right: 44px;
        }
    
        table {
          border: 1px solid #d8d8d8;
          border-collapse: collapse;
        }
    
        tr {
          border-bottom: 1px solid #d8d8d8;
          cursor: pointer;
        }
    
        tr:last-child {
          border: 0;
        }
      </style>
    </head>
    
    <body>
      <button onclick="postPage()">下一页</button>
      <span id="currentPage"></span>
      <table>
        <tbody>
          <tr>
            <th>序号(倒序)</th>
            <th>概率</th>
            <th>名字</th>
          </tr>
        </tbody>
        <tbody id="results"></tbody>
      </table>
    
      <script>
        let currentPage = 0 // 当前页面
        let allItems = [] // 全部数据
        let currentTime = 0 // 锁频率使用,标记上次时间
        const xhr = new XMLHttpRequest()
        const loopInterval = 2 // 锁频率步长,单位秒
        const results = document.querySelector('#results')
        const currentPageText = document.querySelector('#currentPage')
        const reFullTBody = arr => {
          let innerHtml = ''
          arr.forEach((item, i) => {
            item.rate = item.totalNum / item.requestNum * 100
            let tr = `
              <tr onclick="window.open('https://try.taobao.com/item.htm?id=${item.id}')">
                <td>${i + 1}</td>
                <td>${item.rate.toFixed(3) + '%'}</td>
                <td>${item.title}</td>
              </tr>
              `
            if (item.rate > 5) tr = tr.replace('<tr', '<tr class="warning"')
            innerHtml += tr
          })
          currentPageText.innerText = `当前页:${currentPage}`
          results.innerHTML = innerHtml
        }
    
        const postPage = () => {
          // 锁频率步长内取消请求
          const newTime = new Date().getTime()
          const shoudBack = newTime - currentTime < loopInterval * 1000
          if(shoudBack) {
            alert(loopInterval + '秒内不要多次点击哦。')
            return
          }
          currentTime = newTime
          xhr.onreadystatechange = function() {
            if(this.readyState === 4 && this.status === 200) {
              const res = JSON.parse(this.response)
                if(res.length < 1) {
                alert('今天结束的已经筛选完了')
                return
              }
              allItems = [...allItems, ...res]
              allItems.sort((a, b) => b.rate - a.rate)
              reFullTBody(allItems)
              currentPage--
            }
          }
          xhr.open('post', '/table')
          xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
          //发送请求
          xhr.send("page=" + currentPage)
        }
    
        xhr.onreadystatechange = function() {
          if(this.readyState === 4 && this.status === 200) {
            currentPage = JSON.parse(this.response).pages
            postPage()
          }
        }
        xhr.open('get', '/total')
        xhr.send()
      </script>
    </body>
    
    </html>
    

    长这个样子:

    展示

    我多人性化,可以点击跳转、概率超过 5% 红色展示、还告诉你当前所在页码、点太快还给你提示………………………………

    就是这么好用,喜欢的赶紧体验吧!


    线上:点我体验

    Github: Spider


    觉得有用,不要吝惜 star 哦。

  • 相关阅读:
    iCloud文件同步至Mac本地磁盘
    hive多分区写入
    清理hdfs小文件shell脚本
    大数据应用建设开源工具-update2019-07
    手机号码段:中国工信.三大运营商号段-update2019-09
    sparkf:spark-sql替换hive查询引擎
    hivef:hive 执行 sql 文件
    azkaban 工作流2.0开发示例
    MySQL-时间+日期函数
    大数据仓库对业务数据的几个基本要求
  • 原文地址:https://www.cnblogs.com/ZweiZhao/p/9798008.html
Copyright © 2011-2022 走看看