zoukankan html css js c++ java

Selenium截屏图片未加载的问题解决--【懒加载】

需求：

截屏后转PDF。

问题：

selenium截屏后，图片未加载

如下图：

原因：

网站使用了懒加载技术：只有在浏览器中纵向滚动条滚动到指定的位置时，页面的元素才会被动态加载。

什么是图片懒加载？

图片懒加载是一种网页优化技术。图片作为一种网络资源，在被请求时也与普通静态资源一样，将占用网络资源，而一次性将整个页面的所有图片加载完，将大大增加页面的首屏加载时间。

为了解决这种问题，通过前后端配合，使图片仅在浏览器当前视窗内出现时才加载该图片，达到减少首屏图片请求数的技术就被称为“图片懒加载”。

解决：

模拟人滚动滚动条的行为, 实现页面的加载

模拟人滚动滚动条的代码：

        js_height = "return document.body.clientHeight"
        driver.get(link)
        k = 1
        height = driver.execute_script(js_height)
        while True:
            if k * 500 < height:
                js_move = "window.scrollTo(0,{})".format(k * 500)
                print(js_move)
                driver.execute_script(js_move)
                time.sleep(0.2)
                height = driver.execute_script(js_height)
                k += 1
            else:
                break

全部代码：

#!/usr/bin/python3
# -*- coding:utf-8 -*-
"""
@author: lms
@file: screenshot.py
@time: 2020/10/10 13:02
@desc: 
"""

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from PIL import Image


def screenshot_and_convert_to_pdf(link):
    path = './'

    # 一定要使用无头模式，不然截不了全页面，只能截到你电脑的高度
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--no-sandbox')
    driver = webdriver.Chrome(chrome_options=chrome_options)
    try:
        driver.implicitly_wait(20)
        driver.get(link)

        # 模拟人滚动滚动条,处理图片懒加载问题
        js_height = "return document.body.clientHeight"
        driver.get(link)
        k = 1
        height = driver.execute_script(js_height)
        while True:
            if k * 500 < height:
                js_move = "window.scrollTo(0,{})".format(k * 500)
                print(js_move)
                driver.execute_script(js_move)
                time.sleep(0.2)
                height = driver.execute_script(js_height)
                k += 1
            else:
                break

        time.sleep(1)
        # 接下来是全屏的关键，用js获取页面的宽高
        width = driver.execute_script("return document.documentElement.scrollWidth")
        height = driver.execute_script("return document.documentElement.scrollHeight")
        print(width, height)
        # 将浏览器的宽高设置成刚刚获取的宽高
        driver.set_window_size(width, height)
        time.sleep(1)

        png_path = path + '/{}.png'.format('123456')
        # pdf_url = SERVER_URL + '/static/global_tech_map/{}.pdf'.format(.pic_num)
        # 截图并关掉浏览器
        driver.save_screenshot(png_path)
        driver.close()
        # png转pdf
        image1 = Image.open(png_path)
        im1 = image1.convert('RGB')
        pdf_path = png_path.replace('.png', '.pdf')
        im1.save(pdf_path)

    except Exception as e:
        print(e)


if __name__ == '__main__':
    screenshot_and_convert_to_pdf('https://mp.weixin.qq.com/s/nJRnGpPVeJ1kdMIOwiPNpg')

处理完成后的截屏：

感谢阅读~

查看全文

相关阅读:
数据库连接
 《程序员修炼之道--从小工到专家》读后感（二）
《程序员修炼之道--从小工到专家》读后感（一）
《继承与多态》动手动脑
 MyFirstJavaWeb
静态初始化块的执行顺序
 使用类的静态字段和构造函数，可以跟踪某个类所创建对象的个数。请写一个类，在任何时候都可以向它查询“你已经创建了多少个对象？”。
FJUT 毒瘤3（二分 + 最大匹配）题解
 HDU 4638 Group（莫队）题解
 HDU 4391 Paint The Wall（分块的区间维护）

原文地址：https://www.cnblogs.com/liangmingshen/p/13794812.html

Selenium截屏 图片未加载的问题解决--【懒加载】

需求：

问题：

原因：

解决：

处理完成后的截屏：

Selenium截屏图片未加载的问题解决--【懒加载】