首先,什么是Out Of Memory?就是内存溢出,简称OOM(下边我就用这个简称了啊!)。说白了,就是程序想用内存的时候,OS没有那么多内存可以分配了,然后就抱OOM错误了。
首先介绍一下我这个项目的情况,基于exchange 2007+sp1+hmc+web service call,通过一个winform的模拟测试程序,单线程添加信息:循环25万次,每循环一次创建一个公司,开通邮件域名,并创建5个帐号,每个帐号都开通邮件服务。现在循环到100次左右的时候,w3wp.exe的内存占用为private bytes:100M,virtual bytes:300M。听兄弟讲,他们做过类似的测试,当循环到400-500个的时候,会出现OOM的问题。
既然是OOM,我们当然要介绍一个超级cool的工具:debugdiag!(这个工具以后再介绍,因为要贴N多图,实在痛苦……)通过debugdiag抓memory leak的dump(239M),发现有如下问题:mscorwks泄漏了35M左右的内存。
现在转到windbg中来,我们首先看命令!eeheap,它一共有两个参数:
0:000> !help eeheap
-------------------------------------------------------------------------------
!EEHeap [-gc] [-loader]
首先看一下-gc的参数,!eeheap -gc。这个命令表明,我们程序占用托管堆的大小:
0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x01ff3f30
generation 1 starts at 0x01f5df5c
generation 2 starts at 0x019c1000
ephemeral segment allocation context: (0x020bcbf8, 0x020bd46c)
segment begin allocated size
000ead50 7a72c43c 7a74d318 0x00020edc(134876)
000e62a0 790d55a0 790f4b50 0x0001f5b0(128432)
019c0000 019c1000 020bd46c 0x006fc46c(7324780)
Large object heap starts at 0x029c1000
segment begin allocated size
029c0000 029c1000 02cfbbd8 0x0033abd8(3386328)
Total Size 0xa774d0(10974416)
------------------------------
GC Heap Size 0xa774d0(10974416)
dump大小为239M,托管堆大小为11M不到,差别很大的!剩下的内存在哪里?现在来看一个新的命令,!address,运行后,会有一坨又一坨的输出,我们关心的是后面的summary,如下:
-------------------- Usage SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Pct(Busy) Usage
89b8000 ( 141024) : 06.72% 43.69% : RegionUsageIsVAD
6c4b4000 ( 1774288) : 84.61% 00.00% : RegionUsageFree
61d4000 ( 100176) : 04.78% 31.03% : RegionUsageImage
7fc000 ( 8176) : 00.39% 02.53% : RegionUsageStack
20000 ( 128) : 00.01% 00.04% : RegionUsageTeb
4791000 ( 73284) : 03.49% 22.70% : RegionUsageHeap
0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap
1000 ( 4) : 00.00% 00.00% : RegionUsagePeb
1000 ( 4) : 00.00% 00.00% : RegionUsageProcessParametrs
1000 ( 4) : 00.00% 00.00% : RegionUsageEnvironmentBlock
Tot: 7fff0000 (2097088 KB) Busy: 13b3c000 (322800 KB)
-------------------- Type SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Usage
6c4b4000 ( 1774288) : 84.61% : <free>
66c3000 ( 105228) : 05.02% : MEM_IMAGE
30a8000 ( 49824) : 02.38% : MEM_MAPPED
a3d1000 ( 167748) : 08.00% : MEM_PRIVATE
-------------------- State SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Usage
ee99000 ( 244324) : 11.65% : MEM_COMMIT
6c4b4000 ( 1774288) : 84.61% : MEM_FREE
4ca3000 ( 78476) : 03.74% : MEM_RESERVE
上面的信息比较有意思,RegionUsageImage代表的是dlls占用的内存,一共是100M;RegionUsageheap,代表的是NT heaps,一共是73M;MEM_COMMIT和MEM_RESERVE加起来,是virtual memory,他俩的合计是322M。我们还少看了什么?!eeheap还有一个参数,是-loader,运行一下后,会有N长的结果,我们看一部分。
!eeheap –loader,
Domain 2: fbd98
LowFrequencyHeap: 03ba0000(2000:2000) 03d80000(10000:10000) 03ee0000(10000:10000) 040b0000(10000:10000) 04ea0000(10000:10000) 04f70000(10000:f000) 05190000(10000:10000) 很黄很暴力。。。 1edc0000(10000:10000) 1ee00000(10000:3000) Size: 0x1ce2000(30285824)bytes.
Wasted: 0x13000(77824)bytes.
HighFrequencyHeap: 03ba2000(8000:8000) 03e10000(10000:f000) 04aa0000(10000:10000) 050c0000(10000:f000) 05180000(10000:10000) 05430000(10000:10000) 056d0000(10000:10000) 很黄很暴力 1eba0000(10000:10000) 1ee30000(10000:3000) Size: 0x7a9000(8032256)bytes.
Wasted: 0x2000(8192)bytes.
StubHeap: 03baa000(2000:1000) Size: 0x1000(4096)bytes.
Virtual Call Stub Heap:
IndcellHeap: Size: 0x0(0)bytes.
LookupHeap: Size: 0x0(0)bytes.
ResolveHeap: Size: 0x0(0)bytes.
DispatchHeap: Size: 0x0(0)bytes.
CacheEntryHeap: 03bc3000(3000:1000) Size: 0x1000(4096)bytes.
Total size: 0x248c000(38322176)bytes
一共占用了38M内存。继续看下面的内容,我居然发现了9998个module!!!
Module Thunk heaps:
Module 699aa000: Size: 0x0(0)bytes.
Module 648e8000: Size: 0x0(0)bytes.
Module 01963a34: Size: 0x0(0)bytes.
Module 66366000: Size: 0x0(0)bytes.
=============很黄很暴力======================
Module 1ee30010: Size: 0x0(0)bytes.
Module 1ee30634: Size: 0x0(0)bytes.
Module 1ee30c44: Size: 0x0(0)bytes.
Module 1ee3124c: Size: 0x0(0)bytes.
Total size: 0x0(0)bytes
--------------------------------------
Total LoaderHeap size: 0x2a3f000(44298240)bytes
=======================================
问题基本出来了,居然在内存里面有将近1万个module!随便dump出来一个,看看。这里又有一个命令:!dumpmodule
随便dump出来一个看,类似如下信息:
0:000> !dumpmodule 1ee30010
Name: qrt-x6cw, Version=4.0.460.0, Culture=neutral, PublicKeyToken=null
Attributes: PEFile
Assembly: 1cf04e28
LoaderHeap: 00000000
TypeDefToMethodTableMap: 1edcc454
TypeRefToMethodTableMap: 1edcc46c
MethodDefToDescMap: 1edcc4c4
FieldDefToDescMap: 1edcc530
MemberRefToDescMap: 1edcc54c
FileReferencesMap: 1edcc634
AssemblyReferencesMap: 1edcc638
MetaData start address: 1ed20848 (4256 bytes)
这里可能看不到啥,那么我们加一个参数-mt来看看
0:000> !dumpmodule -mt 1ee30010
Name: qrt-x6cw, Version=4.0.460.0, Culture=neutral, PublicKeyToken=null
Attributes: PEFile
Assembly: 1cf04e28
LoaderHeap: 00000000
TypeDefToMethodTableMap: 1edcc454
TypeRefToMethodTableMap: 1edcc46c
MethodDefToDescMap: 1edcc4c4
FieldDefToDescMap: 1edcc530
MemberRefToDescMap: 1edcc54c
FileReferencesMap: 1edcc634
AssemblyReferencesMap: 1edcc638
MetaData start address: 1ed20848 (4256 bytes)
Types defined in this module
MT TypeDef Name
------------------------------------------------------------------------------
1ee3059c 0x02000003 Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderCreateUserResponseData
1ee3046c 0x02000006 Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializerContract
Types referenced in this module
MT TypeRef Name
------------------------------------------------------------------------------
69e58eac 0x01000002 System.Xml.Serialization.XmlSerializationReader
69e59b84 0x01000004 System.Xml.Serialization.XmlSerializerImplementation
03e10b4c 0x01000005 Microsoft.Provisioning.WebServices.HostedActiveDirectory.CreateUserResponseData
69e32be0 0x01000006 System.Xml.XmlReader
790fd8cc 0x01000007 System.Collections.Hashtable
790f8a94 0x0100000e System.Object
69e35c38 0x01000013 System.Xml.XmlQualifiedName
79103c18 0x01000014 System.Boolean
69e3a5b0 0x01000015 System.Xml.XmlNameTable
出现了System.Xml.Serialization,大家熟悉吗?我们转过头来看debugdiag分析的call stack。
Call stack sample 1
Address 0x0c790000
Allocation Time 00:05:04 since tracking started
Allocation Size Bytes
Function Source Destination
mscorjit!norls_allocator::nraAllocNewPage+52
mscorjit!norls_allocator::nraAlloc+19 mscorjit!norls_allocator::nraAllocNewPage
mscorjit!jitNativeCode+84 mscorjit!norls_allocator::nraAlloc
mscorjit!CILJit::compileMethod+3d mscorjit!jitNativeCode
0x79E9C9E0
mscorjit!Compiler::impExpandInline+1aa
mscorjit!Compiler::fgMorphTree+40
mscorjit!Compiler::fgMorphStmts+63 mscorjit!Compiler::fgMorphTree
mscorjit!Compiler::fgMorphBlocks+79 mscorjit!Compiler::fgMorphStmts
mscorjit!Compiler::fgMorph+60 mscorjit!Compiler::fgMorphBlocks
mscorjit!Compiler::compCompile+5f mscorjit!Compiler::fgMorph
mscorjit!Compiler::compCompile+2d8 mscorjit!Compiler::compCompile
mscorjit!jitNativeCode+b8 mscorjit!Compiler::compCompile
mscorjit!CILJit::compileMethod+3d mscorjit!jitNativeCode
0x79EED3FF
System.Xml.Serialization.TempAssembly.InvokeReader(System.Xml.Serialization.XmlMapping, System.Xml.XmlReader, System.Xml.Serialization.XmlDeserializationEvents, System.String)
System.Xml.Serialization.TempAssembly.InvokeReader(System.Xml.Serialization.XmlMapping, System.Xml.XmlReader, System.Xml.Serialization.XmlDeserializationEvents, System.String)
System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String, System.Xml.Serialization.XmlDeserializationEvents) System.Xml.Serialization.TempAssembly.InvokeReader(System.Xml.Serialization.XmlMapping, System.Xml.XmlReader, System.Xml.Serialization.XmlDeserializationEvents, System.String)
System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String) System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String, System.Xml.Serialization.XmlDeserializationEvents)
System.Xml.Serialization.XmlSerializer.Deserialize(System.IO.Stream) System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String)
Microsoft.Provisioning.Sdk.Xml.Serialization.ProvisioningObjectFactory.Convert[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon) System.Xml.Serialization.XmlSerializer.Deserialize(System.IO.Stream)
Microsoft.Provisioning.WebServices.ServiceBase.Submit[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon, System.String, System.String, Boolean) Microsoft.Provisioning.Sdk.Xml.Serialization.ProvisioningObjectFactory.Convert[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon)
Microsoft.Provisioning.WebServices.HostedActiveDirectory.Service.CreateOrganization(Microsoft.Provisioning.WebServices.HostedActiveDirectory.CreateOrganizationRequest, Boolean)
0x79E7BE1B
0x7A11E56E
System.Web.Services.Protocols.LogicalMethodInfo.Invoke(System.Object, System.Object[])
System.Web.Services.Protocols.WebServiceHandler.Invoke() System.Web.Services.Protocols.LogicalMethodInfo.Invoke(System.Object, System.Object[])
0x79F35FC8
System.Threading._TimerCallback.TimerCallback_Context(System.Object)
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
webengine!HashtableIUnknown::AddCallback+a
webengine!HttpCompletion::ProcessRequestInManagedCode+1a3
webengine!HttpCompletion::ProcessRequestInManagedCode+1a3
webengine!HttpCompletion::ProcessCompletion+3e webengine!HttpCompletion::ProcessRequestInManagedCode
webengine!CorThreadPoolWorkitemCallback+18
0x79F70305
0x79F8BC87
kernel32!BaseThreadStart+34
看到这里,基本差不多。偶认为是exchange2007内部的代码问题。此话怎讲?从头说,在System.Xml.Serialization下面,有一个by design的“bug”。我们用reflector看XmlSerializer的构造代码:
2 if (this.tempAssembly == null)
3 {
4 lock (cache)
5 {
6 this.tempAssembly = cache[defaultNamespace, type];
7 if (this.tempAssembly == null)
8 {
9 XmlSerializerImplementation implementation;
10 Assembly assembly = TempAssembly.LoadGeneratedAssembly(type, defaultNamespace, out implementation);
11 if (assembly == null)
12 {
13 this.mapping = new XmlReflectionImporter(defaultNamespace).ImportTypeMapping(type, null, defaultNamespace);
14 this.tempAssembly = GenerateTempAssembly(this.mapping, type, defaultNamespace);
15 }
16 else
17 {
18 this.mapping = XmlReflectionImporter.GetTopLevelMapping(type, defaultNamespace);
19 this.tempAssembly = new TempAssembly(new XmlMapping[] { this.mapping }, assembly, implementation);
20 }
21 }
22 cache.Add(defaultNamespace, type, this.tempAssembly);
23 }
24
so,当我们发现w3wp.exe仅仅3、4百兆的时候,就报OOM了,就是这个原因。
关于这个“bug”,可以看msdn这个kb:http://support.microsoft.com/kb/886385/en-us
至于,是不是这个原因,我也需要等gtec的人确认。现在的工作是,先把这个心得写上。如果有错误,以后会修正。