问题情形
当应用在Azure 应用服务App Service中运行时,有时候出现CPU,Memory很高,但是没有明显的5XX错误和异常日志,有时就是有异常但是也不能明确的指出具体的代码错误。当面临这样的情形是,有效的排查办法就是在问题重现的时候抓取DUMP文件,可以通过DUMP文件分析出是否有线程死锁,或查看时那些请求导致的死锁问题。也可以定位到相关的自定义代码类文件。非常助于下一步的查找问题。
抓取DUMP文件
1) 利用App Service自带Kudu的工具获取DUMP文件
- 登录进当前应用服务的kudu站点,入口URL为:https://<yoursitename>.scm.chinacloudsites.cn/
- 选择Process Explorer找到您需要抓取的进程,如w3wp.exe
- 右键[Full Dump]即可,抓取后DUMP文件会自动下载到本地。
2) 对于.NET/.NET Core应用,使用procdump命令抓取指定的进程DUMP文件
- 登录Kudu站点, 入口URL为:https://<yoursitename>.scm.chinacloudsites.cn/
- 在Process Explorer中找到需要抓取的进程号,如 10524
- 回到DebugConsole页,使用如下procdump命令,抓取5秒钟,抓取三次。
D:\home\LogFiles> D:\devtools\sysinternals\procdump -accepteula -ma 10524(PID) -s 5 -n 3
执行结果如:
Kudu Remote Execution Console Type 'exit' then hit 'enter' to get a new CMD process. Type 'cls' to clear the console Microsoft Windows [Version 10.0.14393] (c) 2016 Microsoft Corporation. All rights reserved. D:\home>D:\devtools\sysinternals\procdump -accepteula -ma 7040 -s 5 -n 3 ProcDump v9.0 - Sysinternals process dump utility Copyright (C) 2009-2017 Mark Russinovich and Andrew Richards Sysinternals - www.sysinternals.com Process: dotnet.exe (7040) Process image: D:\Program Files (x86)\dotnet\dotnet.exe CPU threshold: n/a Performance counter: n/a Commit threshold: n/a Threshold seconds: 5 Hung window check: Disabled Log debug strings: Disabled Exception monitor: Disabled Exception filter: [Includes] * [Excludes] Terminate monitor: Disabled Cloning type: Disabled Concurrent limit: n/a Avoid outage: n/a Number of dumps: 3 Dump folder: D:\home\ Dump filename/mask: PROCESSNAME_YYMMDD_HHMMSS Queue to WER: Disabled Kill after dump: Disabled Press Ctrl-C to end monitoring without terminating the process. [13:47:38] Timed: [13:47:38] Dump 1 initiated: D:\home\dotnet.exe_200827_134738.dmp [13:47:47] Dump 1 writing: Estimated dump file size is 199 MB. [13:47:54] Dump 1 complete: 199 MB written in 16.6 seconds [13:48:00] Timed: [13:48:00] Dump 2 initiated: D:\home\dotnet.exe_200827_134800.dmp [13:48:08] Dump 2 writing: Estimated dump file size is 199 MB. [13:48:13] Dump 2 complete: 199 MB written in 12.8 seconds [13:48:19] Timed: [13:48:19] Dump 3 initiated: D:\home\dotnet.exe_200827_134819.dmp [13:48:26] Dump 3 writing: Estimated dump file size is 200 MB. [13:48:30] Dump 3 complete: 200 MB written in 11.2 seconds [13:48:31] Dump count reached. D:\home>
3) 对于JAVA的站点,则需要在包含JSTACK 或JMAP的JDK版本中抓取,如对于应用运行环境中没有,则需要先修改java container的版本。如:zulu8.17.0.3-jdk8.0.102-win_x64
- 登录Kudu站点, 入口URL为:https://<yoursitename>.scm.chinacloudsites.cn/
- 在Process Explorer中找到需要抓取的进程号,如 5252
- 回到DebugConsole页,使用如下jstack / jmap,抓取5252的dump文件,并保存在threaddump1文件中
"D:\Program Files\Java\zulu8.17.0.3-jdk8.0.102-win_x64\bin"\jstack -F 5252 >D:/home/site/threaddump1.txt
"D:\Program Files\Java\zulu8.17.0.3-jdk8.0.102-win_x64\bin"\jmap -F -J-d64 -heap 5252> D:/home/site/threaddump1.txt
如何分析DUMP文件
可以先使用DebugDiag对文件进行初步的分析,它已集成一些内置的规则,所以无需编写windbg命令,当分析完成之后,会生成分析报告。在报告中,可以知道当前线程的情况,异常情况,及trace stack。
如果需要更多的分析,则需要考虑使用windbg工具,这需要复杂的命令及对dump的分析有非常深厚的要求。
DebugDiag下载地址:https://www.microsoft.com/en-us/download/details.aspx?id=58210
Windbg下载地址:https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools