zoukankan      html  css  js  c++  java
  • 解决使用requests_html模块,req.html.render()下载chromium速度慢问题

    1.第一步,代码如下:

    from requests_html import HTMLSession

    url="https://www.baidu.com/"

    headers={
    "Host": "www.baidu.com",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"

    }

    session=HTMLSession()
    req=session.get(url,headers=headers)
    req.encoding="utf-8"

    req.html.render()
    result=req.html.find("a.mnav",first=True)
    print(req.status_code)
    print(result.text)
    print(result.attrs.get('href'))

    2.因为是第一次使用render函数,需要安装chromium,无奈速度太慢,等待几分钟,才2%

    3.解决步骤如下:

    3.1手动下载chromium

    https://npm.taobao.org/mirrors/chromium-browser-snapshots/Win_x64/650583/

    下载后之后解压。

    3.2 requests_html运行chromium的路径究竟是怎么样的?

    3.2.1 进入python安装目录下的Libsite-packagespyppeteer目录

    笔者的目录是:C:UsersRayAppDataLocalProgramsPythonPython37Libsite-packagespyppeteer

    3.2.2 打开chromium_downloader.py文件

    找到代码:

    chromiumExecutable = {
    'linux': DOWNLOADS_FOLDER / REVISION / 'chrome-linux' / 'chrome',
    'mac': (DOWNLOADS_FOLDER / REVISION / 'chrome-mac' / 'Chromium.app' /
    'Contents' / 'MacOS' / 'Chromium'),
    'win32': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
    'win64': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
    }

    从上面可以看出,win64(笔者的win10 系统是64位的)的chromium路径是:

    DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',

    那么,DOWNLOADS_FOLDER 和REVISION究竟是什么?

    往上面寻找,可以找到以下代码:

    DOWNLOADS_FOLDER = Path(__pyppeteer_home__) / 'local-chromium'

    REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', __chromium_revision__)

    可以使用print函数打印出两个路径,具体代码如下:

    from pyppeteer import __chromium_revision__, __pyppeteer_home__

    DOWNLOADS_FOLDER = Path(__pyppeteer_home__) / 'local-chromium'

    REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', __chromium_revision__)

    print(DOWNLOADS_FOLDER)

    print(REVISION)

    运行py文件,就可以知道两个变量的路径。

    由上面可以知道:chromium路径是:C:UsersRayAppDataLocalpyppeteerpyppeteerlocal-chromium575458chrome-win32chrome.exe

    所以自己建文件夹,然后一直到chrome-win32文件夹,把上面下载的chromium文件,拷贝到此目录下

    4.运行第一步的代码,完美打印。

    具体灵感来源:https://github.com/GoogleChrome/puppeteer/issues/1597

  • 相关阅读:
    Neko's loop HDU-6444(网络赛1007)
    Parameters
    SETLOCAL
    RD / RMDIR Command
    devenv 命令用法
    Cannot determine the location of the VS Common Tools folder.
    'DEVENV' is not recognized as an internal or external command,
    How to change Visual Studio default environment setting
    error signing assembly unknown error
    What is the Xcopy Command?:
  • 原文地址:https://www.cnblogs.com/xiaoaiyiwan/p/10776493.html
Copyright © 2011-2022 走看看