1,问题现象
某次测试发现,程序失去响应。由于程序集成了EurekaLog组件,弹出了错误框。查看其给出的Call Stack信息,发现没有发生线程死锁(DeadLock=0;),问题在于 Wait Chain=找不到指定的程序。
Call Stack Information: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |Methods |Details|Stack |Address |Module |Offset |Unit |Class |Procedure/Method |Line | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |*Exception Thread: ID=14008; Parent=0; Priority=0 | |Class=; Name=MAIN | |DeadLock=0; Wait Chain=找不到指定的程序。 | |Comment= | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |7FFFFFFE|03 |00000000|7C9583AC|ntdll.dll |000283AC|ntdll | |KiFastSystemCallRet | | |00000020|03 |0013EF6C|7C957AC7|ntdll.dll |00027AC7|ntdll | |ZwWaitForSingleObject | | |00000020|03 |0013EFAC|71A8C502|mswsock.dll |0000C502|mswsock | | (possible ServiceMain+7745) | | |00000020|03 |0013F024|71B62FEB|ws2_32.dll |00002FEB|WS2_32 | |recv | | |00000020|03 |0013F11C|0391D42F|myodbc5.dll |0000D42F|myodbc5 | | (possible SQLCopyDesc+4927) | | |00000020|03 |0013F138|0391EBC1|myodbc5.dll |0000EBC1|myodbc5 | | (possible SQLCancel+1361) | | |00000020|03 |0013F150|0392794B|myodbc5.dll |0001794B|myodbc5 | | (possible SQLGetCursorNameW+4955) | | |00000020|03 |0013F15C|4B7567F0|odbc32.dll |000067F0|ODBC32 | | (possible SQLParamOptions+379) | | |00000020|03 |0013F17C|4B7568C2|odbc32.dll |000068C2|ODBC32 | |SQLExecDirectW | | |00000020|03 |0013F208|004B6C8F|Plate_collect_and_forward.exe|000B6C8F|Graphics | |FreeMemoryContexts | | |00000020|03 |0013F29C|4B43A64A|msado15.dll |0000A64A|msado15 | | (possible DllGetClassObject+7071) | | |00000020|03 |0013F32C|749A4EEE|msdart.dll |00014EEE|CEXAutoBackupFile | | (possible CEXAutoBackupFile+303) | | |00000020|03 |0013F338|74998351|msdart.dll |00008351|MSDART | | (possible mpMalloc+350) | | |00000020|03 |0013F34C|77600CAC|oleaut32.dll |00010CAC|oleaut32 | |VariantChangeType | | |00000020|03 |0013F378|74998351|msdart.dll |00008351|MSDART | | (possible mpMalloc+350) | | |00000020|03 |0013F3B4|749984A6|msdart.dll |000084A6|MSDART | | (possible MpGetHeapHandle+299) | | |00000020|03 |0013F3CC|749984A6|msdart.dll |000084A6|MSDART | | (possible MpGetHeapHandle+299) | | |00000020|03 |0013F3D8|749984DD|msdart.dll |000084DD|MSDART | |MpHeapFree | | |00000020|03 |0013F3F0|749984DD|msdart.dll |000084DD|MSDART | |MpHeapFree | | |00000020|03 |0013F454|749980B9|msdart.dll |000080B9|`CLKRHashTableStats |(possible BucketSizes'|`2'.s_aBucketSizes+177) | | |00000020|03 |0013F490|749980B9|msdart.dll |000080B9|`CLKRHashTableStats |(possible BucketSizes'|`2'.s_aBucketSizes+177) | | |00000020|03 |0013F4AC|4B43A4A2|msado15.dll |0000A4A2|msado15 | | (possible DllGetClassObject+6647) | | |00000020|03 |0013F4C0|4B43A380|msado15.dll |0000A380|msado15 | | (possible DllGetClassObject+6357) | | |00000020|03 |0013F54C|749A4EEE|msdart.dll |00014EEE|CEXAutoBackupFile | | (possible CEXAutoBackupFile+303) | | |00000020|03 |0013F558|74998351|msdart.dll |00008351|MSDART | | (possible mpMalloc+350) | | |00000020|03 |0013F55C|74998192|msdart.dll |00008192|MSDART | | (possible UMSEnterCSWraper+178) | | |00000020|03 |0013F5A8|4B43A253|msado15.dll |0000A253|msado15 | | (possible DllGetClassObject+6056) | | |00000020|03 |0013F5F0|749984A6|msdart.dll |000084A6|MSDART | | (possible MpGetHeapHandle+299) | | |00000020|03 |0013F79C|749980B9|msdart.dll |000080B9|`CLKRHashTableStats |(possible BucketSizes'|`2'.s_aBucketSizes+177) | | |00000020|03 |0013F820|005A9A64|Plate_collect_and_forward.exe|001A9A64|ADODB |TCustomADODataSet |OpenCursor | | |00000020|03 |0013F8B4|00599C35|Plate_collect_and_forward.exe|00199C35|DB |TDataSet |SetActive | | |00000020|03 |0013F8D0|00599A80|Plate_collect_and_forward.exe|00199A80|DB |TDataSet |Open | | |00000020|04 |0013F8D4|005C8A0A|Plate_collect_and_forward.exe|001C8A0A|Ufunc | |UpdateOneRec |403[18] | |00000020|04 |0013F900|005C8CE3|Plate_collect_and_forward.exe|001C8CE3|Ufunc | |Handle_DB_OP_1000 |440[18]
查看任务管理器,发现程序占用300多个线程,耗用100多M内存,但cpu耗用为0. 似乎停止了在等待什么。
2,问题分析
那么究竟是什么程序找不到呢?EurekaLog提供的信息中并未提供。由于程序已经失去反应,也得不到其它具体信息。
但可以发现出错时执行的是数据库操作。因此,可以从数据库入手。
程序使用Mysql数据库,因此首先连上数据库查看有无反应。发现可以正常连接使用,因此数据库方面不存在问题。
那么是不是程序对数据库的连接被拒绝了呢?采用Navicat查看服务器监控中的进程列表,发现连接是存在的,但连接时间明显不对,才61秒,显然是刚刚建立起来的连接。
而且伴随着建立新连接,出现了SQL语句。
显然,程序试图连接上DB后,运行此查询语句,但却因为此查询语句被DB给强行断开了。于是程序不断尝试连接DB,DB不断强行断开,形成死结,程序被阻塞在数据库操作上。
到网上查询为什么MySQL会强行断开程序的连接,看到有以下说明:
Your SQL statement was too large,即查询的结果集超过 max_allowed_packet 。
max_allowed_packet 缺省是1048576,1M字节;我的查询中含有Blob字段,显然超出了这个限制!
3,原因验证
既然猜测程序无反应的原因在于查询结果集太大,那么就修改数据库,删除一部分数据。
当删除完成后,发现程序自动恢复运行了,主界面也正常出现了。可见原因就出现在查询结果集大小超出max_allowed_packet值。
顺便说一句,当程序恢复后,主界面上的日志显示了运行错误信息:
更新数据库表错误:[MySQL][ODBC 5.1 Driver][mysqld-5.5.19]MySQL server has gone away
MySQL server has gone away,这才是程序失去反应的真正原因。因此本文标题叫做《一次“MySQL server has gone away”故障及其解决》。
4,解决措施
对于出现的问题,显然关系到程序和数据库两个方面,因此也应该从这两方面来解决。
1),程序应避免将大量数据取到前端执行操作。SQL查询要进行优化。
2),数据库要根据使用要求修改配置参数值,一般可以将max_allowed_packet加大到20M。