zoukankan      html  css  js  c++  java
  • selenium-java web自动化测试工具抓取百度搜索结果实例

    selenium-java web自动化测试工具抓取百度搜索结果实例

    这种方式抓百度的搜索关键字结果非常容易
    抓长尾关键词,根据热门关键词去抓更多内容可以用
    抓google,百度的这种内容容易给屏蔽,用这种就不会了


    1.新建maven项目,引入selenium-java

    <!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
            <dependency>
                <groupId>org.seleniumhq.selenium</groupId>
                <artifactId>selenium-java</artifactId>
                <version>3.8.1</version>
            </dependency>

    2.写代码(因为自动化测试速度极快,每个步骤后都稍微停顿了下方便看效果)

    package com.testselenium;
    
    import java.util.concurrent.TimeUnit;
    
    import org.openqa.selenium.By;
    import org.openqa.selenium.JavascriptExecutor;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.chrome.ChromeDriver;
        
    public class AutoTest {
        
        public static void main(String[] args) throws Exception {
    //        谷歌浏览器的驱动下载地址:https://chromedriver.storage.googleapis.com/index.html
    //        最新稳定版下载地址:https://chromedriver.storage.googleapis.com/index.html?path=2.40/
            System.setProperty("webdriver.chrome.driver", "D://selenium/chromedriver.exe");
            WebDriver webDriver = new ChromeDriver();
    //        火狐浏览器的驱动下载地址:https://github.com/mozilla/geckodriver/releases
    //        System.setProperty("webdriver.gecko.driver", "D://selenium/geckodriver.exe");
    //        WebDriver webDriver = new FirefoxFilter();
            
    //        webDriver.manage().window().maximize();    
    //        webDriver.manage().deleteAllCookies();
            // 与浏览器同步非常重要,必须等待浏览器加载完毕
            webDriver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
            
            //打开目标地址
            webDriver.get("https://www.baidu.com");
            
            Thread.sleep(1000);
            /*
    //      webDriver.findElement(By.xpath("/html/body/div/div[1]/a")).click();
    //      webDriver.findElement(By.cssSelector("html body div#app div.loginPage form.el-form.fromBox button.el-button.loginBtn")).click();
            webDriver.findElement(By.cssSelector(".head_wrapper > div#u1 > a:nth-child(1)")).click();
            Thread.sleep(1000);
            webDriver.findElements(By.className("a3")).forEach(x -> {
                System.out.println(x.getText());
            });
            */
            //输入关键字搜索
            webDriver.findElement(By.cssSelector("input#kw")).sendKeys("java");
            webDriver.findElement(By.cssSelector("input#su")).click();
            Thread.sleep(1000);
            webDriver.findElements(By.className("t")).forEach(x -> {
                System.out.println(x.getText());
            });
            
          //暂停5秒钟后关闭
            Thread.sleep(5000);
    //        webDriver.quit();
            
            //跳转到我的博客
            Thread.sleep(3000);
            webDriver.get("https://www.cnblogs.com/zdz807");
            
            Thread.sleep(1000);
            //打开标题为 下一页
            webDriver.findElement(By.partialLinkText("下一页")).click();
            
            Thread.sleep(1000);
            //移动到底部
            //((JavascriptExecutor) webDriver).executeScript("window.scrollTo(0, document.body.scrollHeight)");
            //移动到指定的坐标(相对当前的坐标移动)  
            ((JavascriptExecutor) webDriver).executeScript("window.scrollBy(0, 700)");  
            Thread.sleep(1000);
            //移动到窗口绝对位置坐标,如下移动到纵坐标1600像素位置  
            ((JavascriptExecutor) webDriver).executeScript("window.scrollTo(0, 1600)");  
            Thread.sleep(1000);
            //移动到指定元素,且元素底部和窗口底部对齐 
            ((JavascriptExecutor) webDriver).executeScript("arguments[0].scrollIntoView(false);", webDriver.findElement(By.cssSelector("#ftCon")));
            
            //暂停5秒钟后关闭
            Thread.sleep(5000);
            webDriver.quit();
            
        }
    }

     

    Starting ChromeDriver 2.40.565498 (ea082db3280dd6843ebfb08a625e3eb905c4f5ab) on port 38505
    Only local connections are allowed.
    七月 27, 2018 7:42:47 下午 org.openqa.selenium.remote.ProtocolHandshake createSession
    信息: Detected dialect: OSS
    java.com: Java 与您官网
    Java_百度百科
    Java SE Development Kit 8 - Downloads
    Java 教程 | 菜鸟教程
    java吧_百度贴吧
    Oracle Technology Network for Java Developers | Oracle ...
    Java - ImportNew
    Java 运算符 | 菜鸟教程
    ImportNew - 专注Java & Android 技术分享
    Java SE - Downloads | Oracle Technology Network | Oracle
    深圳java学习难吗_java培训多久能学会?
    java 菜鸟也能学的Java 4个月挑战月薪上万
    java-中国数万程序员的选择-官方首页
    java深圳菜鸟也能学的java 4个月挑战月薪上万

  • 相关阅读:
    CSS盒子模型
    getContextPath、getServletPath、getRequestURI、request.getRealPath的区别
    MYSQL中的CASE WHEN END AS
    单点登录的精华总结
    git&github
    June 21st 2017 Week 25th Wednesday
    June 20th 2017 Week 25th Tuesday
    June 19th 2017 Week 25th Monday
    June 18th 2017 Week 25th Sunday
    June 17th 2017 Week 24th Saturday
  • 原文地址:https://www.cnblogs.com/zdz8207/p/selenium-java-baidu.html
Copyright © 2011-2022 走看看