zoukankan      html  css  js  c++  java
  • chrome浏览器 headless模式下如何跳过selenium webdriver检测?

    1.chrome浏览器 headless模式下如何跳过webdriver检测?


    环境:
    1.selenium-java

    <dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>3.4.0</version>
    </dependency>

    1.问题描述:
    当使用webdriver驱动chrome headless时,若被识别出来为webdriver时,则爬虫无法继续采集数据,那么该如何跳过浏览器检测继续采集数据?

    2.如何识别浏览器为webdriver?
    a. 在Chrome控制台输入:window.navigator.webdriver,如何是webdriver则为true,否则为undefined
    b. 在Java代码中,只要初始化webdriver的参数中带 enable-automation,headless,remote-debugging-pipe 中的任意一个参数,就会将AutomationControlledEnabled 设置为true,然后 navigator.h 就会设置webdriver为true

    ChromeOptions options = new ChromeOptions();
    String[] a = { "enable-automation" };
    options.setExperimentalOption("excludeSwitches", a);
    options.addArguments("--headless");


    c. 浏览器中的window.navigator.webdriver值来自于navigator.h中的webdriver()方法,当AutomationControlledEnabled为true则webdriver=true
    参考chromium的源代码: https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/third_party/blink/renderer/core/frame/navigator.h


    bool webdriver() const {
    return RuntimeEnabledFeatures::AutomationControlledEnabled();
    }

    d. AutomationControlledEnabled什么时候设置成true?
    参考chromium的源代码: https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/content/child/runtime_features.cc
    只要启动参数带EnableAutomation,Headless,RemoteDebuggingPipe就会标志位AutomationControlled
    {wrf::EnableAutomationControlled, switches::kEnableAutomation, true},
    {wrf::EnableAutomationControlled, switches::kHeadless, true},
    {wrf::EnableAutomationControlled, switches::kRemoteDebuggingPipe, true},

    3.如何跳过浏览器webdriver检测?
    a. 第一种方式:修改navigator.h 将webdriver改为false, 编译自己的chromium,这种可以从根本上解决问题.
    b. 第二种方式:执行cdp命令,修改webdriver的值为undefined .但是selenium-java-3.4.0版本不支持executeCdpCommand方法.这个时候就需要定制自己的ChromiumDriver,添加executeCdpCommand方法

    ChromiumDriver driver = new ChromiumDriver(chromeCaps);
    HashMap<String, Object> cdpCmd = new HashMap<String, Object>();
    cdpCmd.put("source", "Object.defineProperty(navigator, 'webdriver', {get: () => undefined }); ");
    driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", cdpCmd);


    JS命令:Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
    参考: https://www.cnblogs.com/scholarscholar/p/14364822.html
    https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument

    c.第二种方式:升级selenium-java到beta版本,selenium-java-4.0.0-beta版本支持executeCdpCommand方法,但是升级selenium-java-4.0.0会有很多依赖错误需要处理.

    <!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
    <dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.0.0-beta-4</version>
    </dependency>


    4.selenium-java-3.4.0版本不支持executeCdpCommand方法,定制自己的ChromiumDriver,添加executeCdpCommand方法

    <dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>3.4.0</version>
    </dependency>
    package com.xxx.selenium;
    
    import java.util.Map;
    import org.openqa.selenium.Capabilities;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.chrome.ChromeDriverService;
    import org.openqa.selenium.chrome.ChromeOptions;
    import org.openqa.selenium.remote.CommandExecutor;
    import org.openqa.selenium.remote.RemoteWebDriver;
    import com.google.common.collect.ImmutableMap;
    
    public class ChromiumDriver extends RemoteWebDriver {
    
        public ChromiumDriver(Capabilities capabilities) {
            this(new ChromiumDriverCommandExecutor("goog", ChromeDriverService.createDefaultService()), capabilities, ChromeOptions.CAPABILITY);
        }
    
        protected ChromiumDriver(CommandExecutor commandExecutor, Capabilities capabilities, String capabilityKey) {
            super(commandExecutor, capabilities);
    
        }
    
        /**
         * Launches Chrome app specified by id.
         *
         * @param id Chrome app id.
         */
        public void launchApp(String id) {
            execute(ChromiumDriverCommand.LAUNCH_APP, ImmutableMap.of("id", id));
        }
    
        /**
         * Execute a Chrome Devtools Protocol command and get returned result. The
         * command and command args should follow
         * <a href="https://chromedevtools.github.io/devtools-protocol/">chrome devtools
         * protocol domains/commands</a>.
         */
        public Map<String, Object> executeCdpCommand(String commandName, Map<String, Object> parameters) {
    
            @SuppressWarnings("unchecked")
            Map<String, Object> toReturn = (Map<String, Object>) getExecuteMethod().execute(ChromiumDriverCommand.EXECUTE_CDP_COMMAND,
                    ImmutableMap.of("cmd", commandName, "params", parameters));
    
            return ImmutableMap.copyOf(toReturn);
        }
    
        @Override
        public void quit() {
            super.quit();
        }
    }
    
    
    
    package com.xxx.selenium;
    
    /**
     * Constants for the ChromiumDriver specific command IDs.
     */
    final class ChromiumDriverCommand {
      private ChromiumDriverCommand() {}
    
      static final String LAUNCH_APP = "launchApp";
      static final String GET_NETWORK_CONDITIONS = "getNetworkConditions";
      static final String SET_NETWORK_CONDITIONS = "setNetworkConditions";
      static final String DELETE_NETWORK_CONDITIONS = "deleteNetworkConditions";
      static final String EXECUTE_CDP_COMMAND = "executeCdpCommand";
    
      // Cast Media Router APIs
      static final String GET_CAST_SINKS = "getCastSinks";
      static final String SET_CAST_SINK_TO_USE = "selectCastSink";
      static final String START_CAST_TAB_MIRRORING = "startCastTabMirroring";
      static final String GET_CAST_ISSUE_MESSAGE = "getCastIssueMessage";  
      static final String STOP_CASTING = "stopCasting";
    
      static final String SET_PERMISSION = "setPermission";
    }
    
    
    
    
    package com.xxx.selenium;
    
    import static java.util.Collections.unmodifiableMap;
    import java.util.HashMap;
    import java.util.Map;
    
    import org.openqa.selenium.remote.CommandInfo;
    import org.openqa.selenium.remote.http.HttpMethod;
    import org.openqa.selenium.remote.service.DriverCommandExecutor;
    import org.openqa.selenium.remote.service.DriverService;
    
    /**
     * {@link DriverCommandExecutor} that understands ChromiumDriver specific commands.
     *
     * @see <a href="https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/client/command_executor.py">List of ChromeWebdriver commands</a>
     */
    public class ChromiumDriverCommandExecutor extends DriverCommandExecutor {
    
      private static Map<String, CommandInfo> buildChromiumCommandMappings(String vendorKeyword) {
        String sessionPrefix = "/session/:sessionId/";
        String chromiumPrefix = sessionPrefix + "chromium";
        String vendorPrefix = sessionPrefix + vendorKeyword;
    
        HashMap<String, CommandInfo> mappings = new HashMap<>();
    
        mappings.put(ChromiumDriverCommand.LAUNCH_APP,
          new CommandInfo(chromiumPrefix + "/launch_app", HttpMethod.POST));
    
        String networkConditions = chromiumPrefix + "/network_conditions";
        mappings.put(ChromiumDriverCommand.GET_NETWORK_CONDITIONS,
          new CommandInfo(networkConditions, HttpMethod.GET));
        mappings.put(ChromiumDriverCommand.SET_NETWORK_CONDITIONS,
          new CommandInfo(networkConditions, HttpMethod.POST));
        mappings.put(ChromiumDriverCommand.DELETE_NETWORK_CONDITIONS,
          new CommandInfo(networkConditions, HttpMethod.DELETE));
    
        mappings.put( ChromiumDriverCommand.EXECUTE_CDP_COMMAND,
          new CommandInfo(vendorPrefix + "/cdp/execute", HttpMethod.POST));
    
        // Cast / Media Router APIs
        String cast = vendorPrefix + "/cast";
        mappings.put(ChromiumDriverCommand.GET_CAST_SINKS,
          new CommandInfo(cast + "/get_sinks", HttpMethod.GET));
        mappings.put(ChromiumDriverCommand.SET_CAST_SINK_TO_USE,
          new CommandInfo(cast + "/set_sink_to_use", HttpMethod.POST));
        mappings.put(ChromiumDriverCommand.START_CAST_TAB_MIRRORING,
          new CommandInfo(cast + "/start_tab_mirroring", HttpMethod.POST));
        mappings.put(ChromiumDriverCommand.GET_CAST_ISSUE_MESSAGE,
          new CommandInfo(cast + "/get_issue_message", HttpMethod.GET));
        mappings.put(ChromiumDriverCommand.STOP_CASTING,
          new CommandInfo(cast + "/stop_casting", HttpMethod.POST));
    
        mappings.put(ChromiumDriverCommand.SET_PERMISSION,
          new CommandInfo(sessionPrefix + "/permissions", HttpMethod.POST));
    
        return unmodifiableMap(mappings);
      }
    
      public ChromiumDriverCommandExecutor(String vendorPrefix, DriverService service) {
        super(service, buildChromiumCommandMappings(vendorPrefix));
      }
    }
    
    
    
    
    package com.xxx.selenium;
    
    import java.text.SimpleDateFormat;
    import java.util.Date;
    import java.util.HashMap;
    import java.util.Map;
    import java.util.Random;
    
    import org.openqa.selenium.Proxy;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.chrome.ChromeDriver;
    import org.openqa.selenium.chrome.ChromeOptions;
    import org.openqa.selenium.remote.DesiredCapabilities;
    
    
    public class DriverUtil {
        
        /**
         *  获取可以执行cdp命令的ChromiumDriver,可以绕过 webdriver检测
         * 1.https://intoli.com/blog/not-possible-to-block-chrome-headless/
         * 2.https://intoli.com/blog/making-chrome-headless-undetectable/
         * 3.https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/third_party/blink/renderer/core/frame/navigator.h
         * @param request
         * @return
         */
        public ChromiumDriver getChromiumDriver() {
            // 设置谷歌浏览器驱动,我放在项目的路径下,这个驱动可以帮你打开本地的谷歌浏览器
            String driverFilePath = "谷歌浏览器驱动地址";
            if (!StringUtils.isEmpty(driverFilePath)){
            System.setProperty("webdriver.chrome.driver", driverFilePath);
            }
                
    
            // 设置对谷歌浏览器的初始配置 开始
            HashMap<String, Object> prefs = new HashMap<String, Object>();
            ChromeOptions options = new ChromeOptions();
            options.setExperimentalOption("prefs", prefs);
            String[] a = { "enable-automation" };
            options.setExperimentalOption("excludeSwitches", a);
            options.addArguments("--headless");
            options.addArguments("window-size=1920,1080");
    
            String ua="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36";
            options.addArguments(String.format("--user-agent=%s", ua));
            DesiredCapabilities chromeCaps = DesiredCapabilities.chrome();
            chromeCaps.setCapability(ChromeOptions.CAPABILITY, options);
    
            
            //执行cdp命令,修改webdriver的值为undefined
            ChromiumDriver driver = new ChromiumDriver(chromeCaps);
            HashMap<String, Object> cdpCmd = new HashMap<String, Object>();
            cdpCmd.put("source", "Object.defineProperty(navigator, 'webdriver', {get: () => undefined }); ");
            driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", cdpCmd);
    
            return driver;
        }
  • 相关阅读:
    Spark权威指南(中文版)----第5章 结构化API基本操作
    Spark权威指南(中文版)----第2章 Spark简介
    Spark权威指南(中文版)----第4章 结构化API概述
    Spark权威指南(中文版)----第1章Apache Spark是什么
    Java读写锁的实现原理
    【进阶之路】动态代理与字节码生成
    如何写好技术文档——来自Google十多年的文档经验
    谈谈 C++ STL 中的迭代器
    面试官疯狂问我联表查询怎么办? 愣着干嘛?进来白嫖啊!
    面试问题记录 三 (JavaWeb、JavaEE)
  • 原文地址:https://www.cnblogs.com/cdchencw/p/14991851.html
Copyright © 2011-2022 走看看