zoukankan      html  css  js  c++  java
  • 解决使用requests_html模块,req.html.render()下载chromium速度慢问题

    1.第一步,代码如下:

    from requests_html import HTMLSession

    url="https://www.baidu.com/"

    headers={
    "Host": "www.baidu.com",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"

    }

    session=HTMLSession()
    req=session.get(url,headers=headers)
    req.encoding="utf-8"

    req.html.render()
    result=req.html.find("a.mnav",first=True)
    print(req.status_code)
    print(result.text)
    print(result.attrs.get('href'))

    2.因为是第一次使用render函数,需要安装chromium,无奈速度太慢,等待几分钟,才2%

    3.解决步骤如下:

    3.1手动下载chromium

    https://npm.taobao.org/mirrors/chromium-browser-snapshots/Win_x64/650583/

    下载后之后解压。

    3.2 requests_html运行chromium的路径究竟是怎么样的?

    3.2.1 进入python安装目录下的Libsite-packagespyppeteer目录

    笔者的目录是:C:UsersRayAppDataLocalProgramsPythonPython37Libsite-packagespyppeteer

    3.2.2 打开chromium_downloader.py文件

    找到代码:

    chromiumExecutable = {
    'linux': DOWNLOADS_FOLDER / REVISION / 'chrome-linux' / 'chrome',
    'mac': (DOWNLOADS_FOLDER / REVISION / 'chrome-mac' / 'Chromium.app' /
    'Contents' / 'MacOS' / 'Chromium'),
    'win32': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
    'win64': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
    }

    从上面可以看出,win64(笔者的win10 系统是64位的)的chromium路径是:

    DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',

    那么,DOWNLOADS_FOLDER 和REVISION究竟是什么?

    往上面寻找,可以找到以下代码:

    DOWNLOADS_FOLDER = Path(__pyppeteer_home__) / 'local-chromium'

    REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', __chromium_revision__)

    可以使用print函数打印出两个路径,具体代码如下:

    from pyppeteer import __chromium_revision__, __pyppeteer_home__

    DOWNLOADS_FOLDER = Path(__pyppeteer_home__) / 'local-chromium'

    REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', __chromium_revision__)

    print(DOWNLOADS_FOLDER)

    print(REVISION)

    运行py文件,就可以知道两个变量的路径。

    由上面可以知道:chromium路径是:C:UsersRayAppDataLocalpyppeteerpyppeteerlocal-chromium575458chrome-win32chrome.exe

    所以自己建文件夹,然后一直到chrome-win32文件夹,把上面下载的chromium文件,拷贝到此目录下

    4.运行第一步的代码,完美打印。

    具体灵感来源:https://github.com/GoogleChrome/puppeteer/issues/1597

  • 相关阅读:
    hdu 5366 简单递推
    hdu 5365 判断正方形
    hdu 3635 并查集
    hdu 4497 数论
    hdu5419 Victor and Toys
    hdu5426 Rikka with Game
    poj2074 Line of Sight
    hdu5425 Rikka with Tree II
    hdu5424 Rikka with Graph II
    poj1009 Edge Detection
  • 原文地址:https://www.cnblogs.com/xiaoaiyiwan/p/10776493.html
Copyright © 2011-2022 走看看