zoukankan      html  css  js  c++  java
  • HtmlAgilityPack 库 StackOverflowException 解决方案

         最近试用HtmlAgilityPack 来解析html,试用过程中程序会抛出StackOverflowException异常,从MSDN上可以看到,从 .NET Framework 2.0 版开始,将无法通过 try-catch 块捕获 StackOverflowException 对象,并且默认情况下将终止相应的进程。

        调查原因,发现,当一个html结构非常复杂时,HtmlAgilityPack 的递归次数会非常多,于是就报StackOverflowException异常,google了一下,找到下面的解决方案


    public class StackChecker
        public unsafe static bool HasSufficientStack(long bytes)
            var stackInfo = new MEMORY_BASIC_INFORMATION();
            // We subtract one page for our request. VirtualQuery rounds UP to the next page.
            // Unfortunately, the stack grows down. If we're on the first page (last page in the
            // VirtualAlloc), we'll be moved to the next page, which is off the stack! Note this
            // doesn't work right for IA64 due to bigger pages.
            IntPtr currentAddr = new IntPtr((uint)&stackInfo - 4096);
            // Query for the current stack allocation information.
            VirtualQuery(currentAddr, ref stackInfo, sizeof(MEMORY_BASIC_INFORMATION));
            // If the current address minus the base (remember: the stack grows downward in the
            // address space) is greater than the number of bytes requested plus the reserved
            // space at the end, the request has succeeded.
            return ((uint)currentAddr.ToInt64() - stackInfo.AllocationBase) >
                (bytes + STACK_RESERVED_SPACE);
        // We are conservative here. We assume that the platform needs a whole 16 pages to
        // respond to stack overflow (using an x86/x64 page-size, not IA64). That's 64KB,
        // which means that for very small stacks (e.g. 128KB) we'll fail a lot of stack checks
        // incorrectly.
        private const long STACK_RESERVED_SPACE = 4096 * 16;
        private static extern int VirtualQuery(
            IntPtr lpAddress,
            ref MEMORY_BASIC_INFORMATION lpBuffer,
            int dwLength);
        private struct MEMORY_BASIC_INFORMATION
            internal uint BaseAddress;
            internal uint AllocationBase;
            internal uint AllocationProtect;
            internal uint RegionSize;
            internal uint State;
            internal uint Protect;
            internal uint Type;

    然后,在递归次数较多的地方(such as HtmlNode.WriteTo(TextWriter outText) andHtmlNode.WriteTo(XmlWriter writer)):)添加下面的代码:

    if (!StackChecker.HasSufficientStack(4*1024))
                    throw new Exception("The document is too complex to parse");


  • 相关阅读:
    SAP MM 采购信息记录中价格单位转换因子的修改
    SAP MM 特殊库存之T库存初探
    Gnome增加消息提醒extension ( Fedora 28 )
    Arch Linux 更新源(以清华 arch 源为例)
    fedora 28 , firewalld 防火墙控制,firewall-cmd 管理防火墙规则
    apache 访问权限出错,apache selinux 权限问题, (13) Permission Denied
    什么是中间件? (保持更新)
    systemd 和 如何修改和创建一个 systemd service (Understanding and administering systemd)
    进入正在运行的 docker 容器(docker container)
  • 原文地址:https://www.cnblogs.com/xiaoqi/p/2209657.html
Copyright © 2011-2022 走看看