zoukankan      html  css  js  c++  java
  • 爬虫笔记之teambition登录验证码

    一、缘起

    想做的事情太多,计划乱糟糟,想找个工具理一下,想起来了的很久之前用过teambition,打算看一下,然后在登录界面看到一个比较有意思的验证码:

    image

    这种倒是比较有意思哈,看着像是模仿12306的那种,12306的破不了(我真人都要刷几次才能对。。。),这个简单版的还破不了吗,于是激发了我强烈的破解兴趣。


    二、分析

    打开开发者工具,先选中看一下先:

    image

    首先比较雷的是“地球”竟然是文本显示在页面上的,这就比较尴尬了,不过其实这个无所谓,即使是图片也没关系,这里的重点是要每次返回的都有所区分(区分度越大越安全,否则使用使用一些基于统计的方式很容易就能够破掉),否则的话会被以比较低的成本作为一个标识,然后就是那几张图片的显示,里面有个uid,然后还有个index,那么这两个变量是从哪里来的呢,点击刷新按钮,然后观察网络请求会发现有几个:

    image

    这个uid和value的数组下标一拼装就是页面上显示的图标的url,至此看起来没啥毛病。


    然后就是考虑如何破解的问题了,我看这几个图标画的如此清新脱俗,应该是手工画的,既然是手工画的,那么其数量应该是有限的,最多几百个吧,那么完全可以采用打标签的方式来,但是打标签的话几百个也是太多了,而且只是手动打标签识别这种平平无奇的做法,也不值一提了,这有一种无须手动打标签的方式,就是上面的接口中,“地球”所对应的图片一定在下面的values数组中,而我只需要对这个接口多请求几次,然后对它们按照imageName分组,比如“地球”这个分组会对应着很多个values,每个values中都有一张图片是真的“地球”,哪张是呢,所有的values的交集就是,这样进行一个group by imageName --> mapGroup求分组内values交集 --> 得到一个imageName对应的图片的特征,这个就作为模型,识别的时候只需要根据imageName取出模型中对应的图片特征,然后破解时从新请求返回的values找到哪张图片的特征是能够对应上的,就实现了从imageName到图片的识别。


    三、编码实现

    首先请求获取验证码的接口,得到一批图片:

    package cc11001100.misc.crawler.captcha.teambition;
    
    import cc11001100.misc.crawler.utils.HttpUtil;
    import com.alibaba.fastjson.JSON;
    import com.alibaba.fastjson.JSONArray;
    import com.alibaba.fastjson.JSONObject;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.io.IOUtils;
    import org.jsoup.Connection;
    
    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.IOException;
    
    /**
     *
     * 下载一些原始的验证码图片以用作分析
     *
     * https://account.teambition.com/login
     *
     * @author CC11001100
     */
    public class TeambitionCaptchaCrawler {
    
    	public static void handleSingleCaptcha(String captchaResponseJsonStr) throws IOException {
    		JSONObject responseJson = JSON.parseObject(captchaResponseJsonStr);
    		String uid = responseJson.getString("uid");
    		JSONArray values = responseJson.getJSONArray("values");
    		for (int i = 0; i < values.size(); i++) {
    			String url = "https://auth_services.teambition.com/captcha/image?uid=" + uid + "&lang=zh&index=" + i;
    			byte[] captchaBytes = HttpUtil.request(url, null, Connection.Response::bodyAsBytes);
    			String outputLocation = "data/captcha/teambition/captcha-imgs/" + values.getString(i) + ".png";
    			IOUtils.write(captchaBytes, new FileOutputStream(outputLocation));
    		}
    	}
    
    	public static void downloadRawData() throws IOException {
    		for (int i = 0; i < 10000; i++) {
    			String url = "https://auth_services.teambition.com/captcha/setup?num=5&lang=zh&_=" + System.currentTimeMillis();
    			String responseBody = HttpUtil.request(url, connection -> {
    				connection.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36");
    			}, Connection.Response::body);
    			FileUtils.writeStringToFile(new File("data/captcha/teambition/raw-captcha.jsonl"), responseBody + "
    ", "UTF-8", true);
    			handleSingleCaptcha(responseBody);
    		}
    	}
    
    	public static void main(String[] args) throws IOException {
    
    		downloadRawData();
    
    	}
    
    }
    

    得到一批验证码的原始图片:

    image

    raw-captcha.json:

    {"values":["f38d8ea6e916762a57be2108c7c0b29c027650f3","cc6fc6dd8ef8f17b7fa99f44dfaa65df221af4f4","801d66a2673d0ba30ba9d02412d2321e1ba1de94","66cecc1d439a74d10a6a2d628f2a358fa90df1df","5d8029335a0330a1ff9f0c5766f3502dbba1ad1f"],"imageName":"飞机","uid":"27419090-0b57-11ea-8fd8-e31d1dab49d9"}
    {"values":["f54416fb674ae827c35c004e35057bdbc14fc0fe","ca1cc93bdc4236889e154b7542bf5675c7ffedc0","368a7a84132450e43f5020802b397b071dfe7840","0a449565c02a16adbe17b799c30947c8c904ad73","29d6a3b0ff1c42af3a9bb47fd759872a5e8f5931"],"imageName":"锁","uid":"27852940-0b57-11ea-a598-c15471c1be2e"}
    {"values":["84ba343b153e45c4c9aae9b260cfefa297587eda","fffe2e988c105b50c902d6372340a306abda0ce5","04ab1f94a0a73fdbf37d6aa40beb9878ef737c8f","ec2b8e8612ced4cd43f656bce3050dc3ef58c656","e61b104c2e7fbc1c6ab1dfa1b2a23f2c6fde1680"],"imageName":"相机","uid":"27bab830-0b57-11ea-8fd8-e31d1dab49d9"}
    {"values":["295da38531279e1938aa4a87f354f27f62feb159","b3ed0b06f53373ba6d8ffdccea65e807d26b53e6","f6baa4dcea4f4d845f4d13301a39dbbe9bbe9fe9","61ce83c8c500d9cf96b79e34e9147993a0c6b359","93244a67523c11554f3ce8b49950d8fcbdbbf8fd"],"imageName":"锁","uid":"27ecebc0-0b57-11ea-8fd8-e31d1dab49d9"}
    {"values":["b2c5fa8ec472914cbdaff3d790fc0eb0c8a45adf","5ebd5d74430628a34fb189e9efc919c12afa069e","369adc4406bd49e7876e760b3e13dcc2637daa87","d664b3e2334880096f88fd50349e7d5b9e4e0fcd","cef15533490a94c27eeb8ae8dd97efb47ba717ce"],"imageName":"相机","uid":"282587f0-0b57-11ea-8fd8-e31d1dab49d9"}
    ...

    然后就是刚才从刚才下载到的验证码图片中生成imageName到图片特征的一个map:

    package cc11001100.misc.crawler.captcha.teambition;
    
    import com.alibaba.fastjson.JSON;
    import com.alibaba.fastjson.JSONArray;
    import lombok.extern.slf4j.Slf4j;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.lang3.StringUtils;
    
    import javax.imageio.ImageIO;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.util.HashMap;
    import java.util.HashSet;
    import java.util.Map;
    import java.util.Set;
    import java.util.stream.Collectors;
    
    /**
     * @author CC11001100
     */
    @Slf4j
    public class DerivationLabel {
    
        // 将用到的图片集找出来并打个标注
        private static void derivationLabel() throws IOException {
            Map<String, Integer> imageNameToHashCodeMap = new HashMap<>();
            FileUtils.readLines(new File("data/captcha/teambition/raw-captcha.jsonl"), "UTF-8").stream()
                    .filter(StringUtils::isNotBlank)
                    .collect(Collectors.groupingBy(line -> JSON.parseObject(line).getString("imageName")))
                    .forEach((imageName, lineList) -> {
                        Set<Integer> interceptingSet = new HashSet<>();
                        for (String line : lineList) {
                            JSONArray values = JSON.parseObject(line).getJSONArray("values");
                            Set<Integer> currentSet = new HashSet<>();
                            // 下载的时候有几次强制中断观察效果,所以一组values的图片可能会下得不全,不全的这种就直接忽略掉了
                            boolean hasError = false;
                            for (int i = 0; i < values.size(); i++) {
                                String f = "data/captcha/teambition/captcha-imgs/" + values.getString(i) + ".png";
                                try {
                                    currentSet.add(ImageUtil.hash(ImageIO.read(new FileInputStream(f))));
                                } catch (Exception e) {
                                    log.error("Exception, path=" + f, e);
                                    hasError = true;
                                    break;
                                }
                            }
                            if (hasError) {
                                return;
                            }
                            if (interceptingSet.isEmpty()) {
                                interceptingSet.addAll(currentSet);
                            } else {
                                interceptingSet.retainAll(currentSet);
                                if (interceptingSet.isEmpty()) {
                                    log.info("数据不足,imageName={}", imageName);
                                    break;
                                }
                            }
                        }
                        if (interceptingSet.size() != 1) {
                            log.info("imageName={}, derivation failed", imageName);
                        } else {
                            log.info("imageName={}, set={}", imageName, interceptingSet);
                            imageNameToHashCodeMap.put(imageName, interceptingSet.iterator().next());
                        }
                    });
            imageNameToHashCodeMap.forEach((k, v) -> System.out.printf("map.put("%s", %d);
    ", k, v));
        }
    
        public static void main(String[] args) throws IOException {
            derivationLabel();
        }
    
    }
    

    这里对图片的特征就是取hash值,用到的工具类如下:

    package cc11001100.misc.crawler.captcha.teambition;
    
    import lombok.extern.slf4j.Slf4j;
    
    import java.awt.image.BufferedImage;
    
    /**
     * @author CC11001100
     */
    @Slf4j
    public class ImageUtil {
    
        public static int hash(BufferedImage image) {
            StringBuilder msg = new StringBuilder();
            for (int i = 0; i < image.getWidth(); i++) {
                for (int j = 0; j < image.getHeight(); j++) {
                    msg.append(image.getRGB(i, j)).append("|");
                }
            }
            return msg.toString().hashCode();
        }
    
    }
    

    生成的map如下:

    map.put("音符", 182834422);
    map.put("锁", 825168351);
    map.put("机器人", -1714422141);
    map.put("汽车", -769011042);
    map.put("钥匙", 975258806);
    map.put("树叶", -179264444);
    map.put("信封", -702966573);
    map.put("相机", 663652535);
    map.put("文件夹", -1425863546);
    map.put("云朵", 2106631124);
    map.put("飞机", 1640044711);
    map.put("T恤", -258338857);
    map.put("眼睛", 1647675580);
    map.put("树", -2063289315);
    map.put("放大镜", -1715725768);
    map.put("闹钟", 1335715652);
    map.put("回形针", 1654053339);
    map.put("地球", -1592219546);
    map.put("脚印", -1438760947);
    map.put("标签", 761482882);
    map.put("剪刀", 1998833602);
    map.put("灯泡", 418507311);
    map.put("伞", -2104015908);
    map.put("图表", -824773152);
    map.put("气球", 1423728112);
    map.put("太阳眼镜", 1204904862);
    map.put("椅子", 193112560);
    map.put("打印机", -939522792);
    map.put("旗帜", 834329993);
    map.put("猫", 1911236121);
    map.put("女人", 2047088238);
    map.put("男人", 664214693);
    map.put("卡车", -1453025175);
    map.put("电脑", -1970735883);
    map.put("裤子", -337658120);
    map.put("铅笔", 1993614559);
    map.put("房子", -1299209990);

    然后就是识别部分了,这里只是将答案打印出来,并不提交,提交的话短信就真的发出去了:

    package cc11001100.misc.crawler.captcha.teambition;
    
    import cc11001100.misc.crawler.utils.HttpUtil;
    import com.alibaba.fastjson.JSON;
    import com.alibaba.fastjson.JSONArray;
    import com.alibaba.fastjson.JSONObject;
    import lombok.Builder;
    import lombok.Data;
    import lombok.extern.slf4j.Slf4j;
    import org.jsoup.Connection;
    
    import javax.imageio.ImageIO;
    import java.awt.image.BufferedImage;
    import java.io.ByteArrayInputStream;
    import java.io.IOException;
    import java.util.HashMap;
    import java.util.Map;
    
    /**
     * @author CC11001100
     */
    @Slf4j
    public class TeambitionCaptchaCracker {
    
        private static final Map<String, Integer> map = new HashMap<>();
    
        static {
            map.put("音符", 182834422);
            map.put("锁", 825168351);
            map.put("机器人", -1714422141);
            map.put("汽车", -769011042);
            map.put("钥匙", 975258806);
            map.put("树叶", -179264444);
            map.put("信封", -702966573);
            map.put("相机", 663652535);
            map.put("文件夹", -1425863546);
            map.put("云朵", 2106631124);
            map.put("飞机", 1640044711);
            map.put("T恤", -258338857);
            map.put("眼睛", 1647675580);
            map.put("树", -2063289315);
            map.put("放大镜", -1715725768);
            map.put("闹钟", 1335715652);
            map.put("回形针", 1654053339);
            map.put("地球", -1592219546);
            map.put("脚印", -1438760947);
            map.put("标签", 761482882);
            map.put("剪刀", 1998833602);
            map.put("灯泡", 418507311);
            map.put("伞", -2104015908);
            map.put("图表", -824773152);
            map.put("气球", 1423728112);
            map.put("太阳眼镜", 1204904862);
            map.put("椅子", 193112560);
            map.put("打印机", -939522792);
            map.put("旗帜", 834329993);
            map.put("猫", 1911236121);
            map.put("女人", 2047088238);
            map.put("男人", 664214693);
            map.put("卡车", -1453025175);
            map.put("电脑", -1970735883);
            map.put("裤子", -337658120);
            map.put("铅笔", 1993614559);
            map.put("房子", -1299209990);
        }
    
        public static Answer getAnswer(JSONObject responseJsonObject) throws IOException {
            String imageName = responseJsonObject.getString("imageName");
            Integer targetHashcode = map.get(imageName);
            if (targetHashcode == null) {
                return null;
            }
            JSONArray values = responseJsonObject.getJSONArray("values");
            String uid = responseJsonObject.getString("uid");
            for (int i = 0; i < values.size(); i++) {
                String url = "https://auth_services.teambition.com/captcha/image?uid=" + uid + "&lang=zh&index=" + i;
                byte[] imageBytes = HttpUtil.request(url, null, Connection.Response::bodyAsBytes);
                if (imageBytes == null) {
                    log.info("image download failed, imageName={}, uid={}, index={}", imageName, uid, i);
                    continue;
                }
                BufferedImage image = ImageIO.read(new ByteArrayInputStream(imageBytes));
                int currentImageHashcode = ImageUtil.hash(image);
                if (currentImageHashcode == targetHashcode) {
                    return Answer.builder().imageName(imageName).imageUrl(url).index(i).build();
                }
            }
            return null;
        }
    
        public static void test() throws IOException {
            String url = "https://auth_services.teambition.com/captcha/setup?num=5&lang=zh&_=" + System.currentTimeMillis();
            String responseJsonStr = HttpUtil.request(url, connection -> {
                connection.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36");
            }, Connection.Response::body);
            JSONObject responseJsonObject = JSON.parseObject(responseJsonStr);
            Answer answer = getAnswer(responseJsonObject);
            if (answer == null) {
                log.info("not find answer, responseJsonStr={}", responseJsonStr);
            } else {
                System.out.println(JSON.toJSONString(answer, true));
            }
        }
    
        @Data
        @Builder
        public static class Answer {
            private String imageName;
            private String imageUrl;
            private int index;
        }
    
        public static void main(String[] args) throws IOException {
            test();
        }
    
    }
    

    输出如下:

    {
    	"imageName":"气球",
    	"imageUrl":"https://auth_services.teambition.com/captcha/image?uid=814b9d80-0b5a-11ea-8fd8-e31d1dab49d9&lang=zh&index=3",
    	"index":3
    }

    点一下查看图片(点自己控制台上的,验证码图片都是会过期的,这里的链接过不多久就不能用了),发现是气球,多试个几次也都是对的,至此破解完毕。


    相关资料:

    1. https://account.teambition.com/login


    .

  • 相关阅读:
    poj3693 Maximum repetition substring (后缀数组+rmq)
    spoj687 REPEATS
    bzoj3626: [LNOI2014]LCA (树链剖分+离线线段树)
    bzoj2243 [SDOI2011]染色 (树链剖分+线段树)
    SPOJ QTREE- Query on a tree (树链剖分)
    hdu5662 YJQQQAQ and the function (单调栈)
    hdu4348 To the moon (主席树 || 离线线段树)
    hdu3565 Bi-peak Number (有上界和下界的数位dp)
    修改文件上传大小限制
    强制不按行
  • 原文地址:https://www.cnblogs.com/cc11001100/p/11901976.html
Copyright © 2011-2022 走看看