zoukankan      html  css  js  c++  java
  • [Selenium+Java] How to Find All/Broken links using Selenium Webdriver

    Original URL: https://www.guru99.com/find-broken-links-selenium-webdriver.html

    How to Find All/Broken links using Selenium Webdriver

    What are Broken Links?

    Broken links are links or URLs that are not reachable. They may be down or not functioning due to some server error

    An URL will always have a status with 2xx which is valid. There are different HTTP status codes which are having different purposes. For an invalid request, HTTP status is 4xx and 5xx.

    4xx class of status code is mainly for client side error, and 5xx class of status codes is mainly for the server response error.

    We will most likely be unable to confirm if that link is working or not until we click and confirm it.

    Why should you check Broken links?

    You should always make sure that there are no broken links on the site because the user should not land into an error page.

    The error happens if the rules are not updated correctly, or the requested resources are not existing at the server.

    Manual checking of links is a tedious task, because each webpage may have a large number of links & manual process has to be repeated for all pages.

    An Automation script using Selenium that will automate the process is a more apt solution.

    How to check the Broken Links and images

    For checking the broken links, you will need to do the following steps.

    1. Collect all the links in the web page based on <a> tag.
    2. Send HTTP request for the link and read HTTP response code.
    3. Find out whether the link is valid or broken based on HTTP response code.
    4. Repeat this for all the links captured.

    Code to Find the Broken links on a webpage

    Below is the web driver code which tests our use case:

    package automationPractice;
    
    import java.io.IOException;
    import java.net.HttpURLConnection;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.Iterator;
    import java.util.List;
    
    import org.openqa.selenium.By;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.WebElement;
    import org.openqa.selenium.chrome.ChromeDriver;
    
    public class BrokenLinks {
        
        private static WebDriver driver = null;
    
        public static void main(String[] args) {
            // TODO Auto-generated method stub
            
            String homePage = "http://www.zlti.com";
            String url = "";
            HttpURLConnection huc = null;
            int respCode = 200;
            
            driver = new ChromeDriver();
            
            driver.manage().window().maximize();
            
            driver.get(homePage);
            
            List<WebElement> links = driver.findElements(By.tagName("a"));
            
            Iterator<WebElement> it = links.iterator();
            
            while(it.hasNext()){
                
                url = it.next().getAttribute("href");
                
                System.out.println(url);
            
                if(url == null || url.isEmpty()){
    System.out.println("URL is either not configured for anchor tag or it is empty");
                    continue;
                }
                
                if(!url.startsWith(homePage)){
                    System.out.println("URL belongs to another domain, skipping it.");
                    continue;
                }
                
                try {
                    huc = (HttpURLConnection)(new URL(url).openConnection());
                    
                    huc.setRequestMethod("HEAD");
                    
                    huc.connect();
                    
                    respCode = huc.getResponseCode();
                    
                    if(respCode >= 400){
                        System.out.println(url+" is a broken link");
                    }
                    else{
                        System.out.println(url+" is a valid link");
                    }
                        
                } catch (MalformedURLException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
            
            driver.quit();
    
        }
    }
    

    Explaining the code

    Step 1: Import Packages

    Import below package in addition to default packages:

    import java.net.HttpURLConnection;

    Using the methods in this package, we can send HTTP requests and capture HTTP response codes from the response.

    Step 2: Collect all links in web page

    Identify all links in a webpage and store them in List.

    List<WebElement> links = driver.findElements(By.tagName("a"));

    Obtain Iterator to traverse through the List.

    Iterator<WebElement> it = links.iterator();

    Step 3: Identifying and Validating URL

    In this part, we will check if URL belongs to Third party domain or whether URL is empty/null.

    Get href of anchor tag and store it in url variable.

    url = it.next().getAttribute("href");

    Check if URL is null or Empty and skip the remaining steps if the condition is satisfied.

    if(url == null || url.isEmpty()){
                  System.out.println("URL is either not configured for anchor tag or it is empty");
                  continue;
         }

    Check whether URL belongs to a main domain or third party. Skip the remaining steps if it belongs to third party domain.

     if(!url.startsWith(homePage)){
               System.out.println("URL belongs to another domain, skipping it.");
               continue;
       }

    Step 4: Send http request

    HttpURLConnection class has methods to send HTTP request and capture HTTP response code. So, output of openConnection() method (URLConnection) is type casted to HttpURLConnection.

    huc = (HttpURLConnection)(new URL(url).openConnection());

    We can set Request type as "HEAD" instead of "GET". So that only headers are returned and not document body.

    huc.setRequestMethod("HEAD");

    On invoking connect() method, actual connection to url is established and the request is sent.

    huc.connect();

    Step 5: Validating Links

    Using getResponseCode() method we can get response code for the request

    respCode = huc.getResponseCode();

    Based on response code we will try to check link status.

    if(respCode >= 400){
            System.out.println(url+" is a broken link");
    }
    else{
            System.out.println(url+" is a valid link");
    }
    

    Thus, we can obtain all links from web page and print whether links are valid or broken.

    Hope this tutorial helps you in checking Broken links using selenium.

    TroubleShooting

    In an isolated case, the first link accessed by the code could be the "Home" Link. In such case, driver.navigate.back() action will show a blank page as the 1st action is opening a browser. The driver will not be able to find all other links in a blank browser. So IDE will throw an exception and rest of the code will not execute. This can be easily handled using an If loop.

     

  • 相关阅读:
    测试思想-测试方法 常用测试操作手段
    测试思想-测试执行 缺陷提交,优先级
    loadrunner 技巧-模拟Run Logic中的随机Action运行
    loadrunner 脚本开发-执行操作系统命令
    测试思想-测试执行 测试过程中的用例维护
    测试思想-测试执行 如何进行回归测试?
    python 全栈开发,Day62(外键的变种(三种关系),数据的增删改,单表查询,多表查询)
    python 全栈开发,Day61(库的操作,表的操作,数据类型,数据类型(2),完整性约束)
    python 全栈开发,Day60(MySQL的前戏,数据库概述,MySQL安装和基本管理,初识MySQL语句)
    python 全栈开发,Day59(小米商城)
  • 原文地址:https://www.cnblogs.com/alicegu2009/p/9098793.html
Copyright © 2011-2022 走看看