puppeteer发布应该有一段时间了,这两天正好基于该工具写了一些自动化解决方案,在这里抛砖引给大家介绍一下。
官方描述:
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
简单来说,puppeteer的特点如下
- 是node的库
- 基于DevTools Protocol协议
- 默认是无界面模式运行
安装
npm i puppeteer
# or "yarn add puppeteer"
基本使用方式
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
上面代码的作用是打开一个页面,然后给这个页面截图,最后关闭浏览器。
想象空间
- 可以做一些界面的自动化工作
- 可以做爬虫
- 可以在服务器上稳定运行,方便容器化
更多例子
将页面保存成pdf的例子
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
await browser.close();
})();
在页面上下文执行js的例子
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Get the "viewport" of the page, as reported by the page.
const dimensions = await page.evaluate(() => {
return {
document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio
};
});
console.log('Dimensions:', dimensions);
await browser.close();
})();
在亚马逊搜索商品的例子
/**
* @name Amazon search
*
* @desc Looks for a "nyan cat pullover" on amazon.com, goes two page two clicks the third one.
*/
const puppeteer = require('puppeteer')
const screenshot = 'amazon_nyan_cat_pullover.png'
try {
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.setViewport({ 1280, height: 800 })
await page.goto('https://www.amazon.com')
await page.type('#twotabsearchtextbox', 'nyan cat pullover')
await page.click('input.nav-input')
await page.waitForSelector('#resultsCol')
await page.screenshot({path: 'amazon_nyan_cat_pullovers_list.png'})
await page.click('#pagnNextString')
await page.waitForSelector('#resultsCol')
const pullovers = await page.$$('a.a-link-normal.a-text-normal')
await pullovers[2].click()
await page.waitForSelector('#ppd')
await page.screenshot({path: screenshot})
await browser.close()
console.log('See screenshot: ' + screenshot)
})()
} catch (err) {
console.error(err)
}
登陆github的例子
/**
* @name Github
*
* @desc Logs into Github. Provide your username and password as environment variables when running the script, i.e:
* `GITHUB_USER=myuser GITHUB_PWD=mypassword node github.js`
*
*/
const puppeteer = require('puppeteer')
const screenshot = 'github.png';
(async () => {
const browser = await puppeteer.launch({headless: true})
const page = await browser.newPage()
await page.goto('https://github.com/login')
await page.type('#login_field', process.env.GITHUB_USER)
await page.type('#password', process.env.GITHUB_PWD)
await page.click('[name="commit"]')
await page.waitForNavigation()
await page.screenshot({ path: screenshot })
browser.close()
console.log('See screenshot: ' + screenshot)
})()
常见问题
谁在维护puppeteer?
Chrome DevTools 团队
Puppeteer可以替换selenium/webdriver吗?
不可以。这2个工具的目的是不一样的。
selenium的目的是一套脚本运行在不同浏览器上,可以做兼容性测试;
puppeteer专注于Chromium的功能测试。