zoukankan      html  css  js  c++  java
  • HtmlAgilityPack 库 StackOverflowException 解决方案

         最近试用HtmlAgilityPack 来解析html,试用过程中程序会抛出StackOverflowException异常,从MSDN上可以看到,从 .NET Framework 2.0 版开始,将无法通过 try-catch 块捕获 StackOverflowException 对象,并且默认情况下将终止相应的进程。

        调查原因,发现,当一个html结构非常复杂时,HtmlAgilityPack 的递归次数会非常多,于是就报StackOverflowException异常,google了一下,找到下面的解决方案

    首先,在库中新增一个类:

    public class StackChecker
    {
        public unsafe static bool HasSufficientStack(long bytes)
        {
            var stackInfo = new MEMORY_BASIC_INFORMATION();
    
            // We subtract one page for our request. VirtualQuery rounds UP to the next page.
            // Unfortunately, the stack grows down. If we're on the first page (last page in the
            // VirtualAlloc), we'll be moved to the next page, which is off the stack! Note this
            // doesn't work right for IA64 due to bigger pages.
            IntPtr currentAddr = new IntPtr((uint)&stackInfo - 4096);
    
            // Query for the current stack allocation information.
            VirtualQuery(currentAddr, ref stackInfo, sizeof(MEMORY_BASIC_INFORMATION));
    
            // If the current address minus the base (remember: the stack grows downward in the
            // address space) is greater than the number of bytes requested plus the reserved
            // space at the end, the request has succeeded.
            return ((uint)currentAddr.ToInt64() - stackInfo.AllocationBase) >
                (bytes + STACK_RESERVED_SPACE);
        }
    
        // We are conservative here. We assume that the platform needs a whole 16 pages to
        // respond to stack overflow (using an x86/x64 page-size, not IA64). That's 64KB,
        // which means that for very small stacks (e.g. 128KB) we'll fail a lot of stack checks
        // incorrectly.
        private const long STACK_RESERVED_SPACE = 4096 * 16;
    
        [DllImport("kernel32.dll")]
        private static extern int VirtualQuery(
            IntPtr lpAddress,
            ref MEMORY_BASIC_INFORMATION lpBuffer,
            int dwLength);
    
        private struct MEMORY_BASIC_INFORMATION
        {
            internal uint BaseAddress;
            internal uint AllocationBase;
            internal uint AllocationProtect;
            internal uint RegionSize;
            internal uint State;
            internal uint Protect;
            internal uint Type;
        }
    }

    然后,在递归次数较多的地方(such as HtmlNode.WriteTo(TextWriter outText) andHtmlNode.WriteTo(XmlWriter writer)):)添加下面的代码:

    if (!StackChecker.HasSufficientStack(4*1024))
                    throw new Exception("The document is too complex to parse");

    OK,大功告成!

  • 相关阅读:
    [转]PC客户端与Android服务端的Socket同步通信(USB)
    [转]Android手机通过socket与pc通信
    [转]异常:android.os.NetworkOnMainThreadException
    [转]使用openssl库实现RSA、AES数据加密
    朴素贝叶斯分类器
    关于Mysql数据库的注意点
    poj 2386 Lake Counting
    poj 3253 Fence Repair
    poj 3069 Saruman's Army
    pat1100. Mars Numbers (20)
  • 原文地址:https://www.cnblogs.com/xiaoqi/p/2209657.html
Copyright © 2011-2022 走看看