zoukankan      html  css  js  c++  java
  • [Node.js] Web Scraping with Pagination and Advanced Selectors

    When web scraping, you'll often want to get more than just one page of data. Xray supports pagination by finding the "next" or "more" button on each page and cycling through each new page until it can no longer find that link. This lesson demonstrates how to paginate as well as more advanced selectors for when links are difficult to scrape.

    /**
     * Created by Answer1215 on 8/22/2015.
     */
    var Xray = require('x-ray');
    var xray = new Xray();
    
    xray('https://news.ycombinator.com/', '.athing', [{
        rank: '.rank',
        title: 'td:nth-child(3) a',
        link: "td:nth-child(3) a@href"
    }])
        .paginate('a[rel="nofollow"]:last-child@href')
        .limit(3)
        .write('./results2.json');
    
    ///////////////////////////////
    //  test
    ///////////////////////////////
    
    xray('https://news.ycombinator.com/', 'a[rel="nofollow"]', [{
        show: ''
    }]).write('./results2.json');
    /**
     * [
     {
       "show": "Segment is hiring security engineers to help secure our container fleet"
     },
     {
       "show": "Modafinil for cognitive neuroenhancement: a systematic review"
     },
     {
       "show": "Ports and Power in the Indian Ocean"
     },
     {
       "show": "Natural and Artificial Intelligence (1988) [pdf]"
     },
     {
       "show": "Proofing Spirits with a Homemade Electrobalance"
     },
     {
       "show": "Seth Nickell on Replacing the Aging Init Procedure on Linux (2003)"
     },
     {
       "show": "More"
     }
     ]
     * */
    
    xray('https://news.ycombinator.com/', 'a[rel="nofollow"]:last-child', [{
        show: ''
    }]).write('./results2.json');
    /*
    * [
     {
     "show": "More"
     }
     ]
    * */
  • 相关阅读:
    idea的一些常用快捷键
    php加密总结
    C 简单1
    webpack window 添加ES6支出
    webpack window 添加第三方库
    webpack window 处理图片和其他静态文件
    webpack window 使用sass来编译css样式
    webpack window 安装loader
    webpack window dev-server配置
    webpack window下配置的hello world
  • 原文地址:https://www.cnblogs.com/Answer1215/p/4750566.html
Copyright © 2011-2022 走看看