zoukankan      html  css  js  c++  java
  • 20130427遇到的2个问题:503错误与Couchbase集群CPU占用不均衡

    (注:这2个问题与阿里云一点关系没有)

    一、503错误

    今天13:00~13:10左右,出现了503错误。出错原因是当时的并发请求数超出了IIS应用程序池的队列长度(Queue Length),当时用的是IIS的默认设置1000(见下图)。

    我们将这里的Queue Length由1000改为2000解决了问题(最大可以设置为65535)。

    后来发现可以通过 Performance Monitor 监测 "HTTP Service Request queue" -> "Arrival Rate" 来设定 Queue Length。

    比如上图中显示"Arrival Rate"的最大值是400,那么Queue Length最好大于400。

    看一下当时的负载均衡中一台Web服务器的CPU监控图:

    (红色曲线表示%Processor Time,绿色曲线表示Request Execution Time)

    不知当时这台云服务器发生了什么异常情况?看来503错误的根源是云服务器的CPU异常,已向阿里云提交工单了解情况。

    更新:

    经过仔细排查,503错误是当时应用程序池崩溃引起的,应用程序池崩溃是Couchbase客户端引起的,当时正在进行Couchbase集群增/减服务器的操作。

    证据来自Windows事件日志:

    Exception: System.NullReferenceException
    Message: Object reference not set to an instance of an object.
    StackTrace:    at Hammock.RestClient.CompleteWithQuery(WebQuery query, RestRequest request, RestCallback callback, WebQueryAsyncResult result)
       at Hammock.RestClient.<>c__DisplayClass18.<BeginRequestImpl>b__15(Object sender, WebQueryResponseEventArgs args)
       at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
       at Hammock.Web.WebQuery.OnQueryResponse(WebQueryResponseEventArgs args)
       at Hammock.Web.WebQuery.HandleWebException(WebException exception)
       at Hammock.Web.WebQuery.GetAsyncResponseCallback(IAsyncResult asyncResult)
       at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
       at System.Threading.ExecutionContext.runTryCode(Object userData)
       at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Net.ContextAwareResult.Complete(IntPtr userToken)
       at System.Net.HttpWebRequest.SetResponse(Exception E)
       at System.Net.ConnectionReturnResult.SetResponses(ConnectionReturnResult returnResult)
       at System.Net.Connection.CompleteConnectionWrapper(Object request, Object state)
       at System.Net.PooledStream.ConnectionCallback(Object owningObject, Exception e, Socket socket, IPAddress address)
       at System.Net.ServicePoint.ConnectSocketCallback(IAsyncResult asyncResult)
       at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
       at System.Net.ContextAwareResult.Complete(IntPtr userToken)
       at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
       at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)
    Application: w3wp.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: System.NullReferenceException
    Stack:
       at System.Net.ServicePoint.ConnectSocketCallback(System.IAsyncResult)
       at System.Net.LazyAsyncResult.Complete(IntPtr)
       at System.Net.ContextAwareResult.Complete(IntPtr)
       at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
       at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
    Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
    Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
    Exception code: 0xc0000005
    Fault offset: 0x000007ff0033cbed
    Faulting process id: 0x10b4
    Faulting application start time: 0x01ce42fb6c5d3e18
    Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
    Faulting module path: unknown
    Report Id: 30767fd7-aef7-11e2-8bf7-e5d3e0390d57

    2.  Couchbase集群CPU占用不均衡

    (Couchbase管理控制台)

    (Linux top命令运行结果)

    两台Couchbase组建的集群,CPU占用却相差很大,Couchbase版本是2.0.0。

    google之后找到High cpu usage in memcached process,原来是Couchbase 2.0.0的bug,升级至最新版Couchbase 2.0.1可以解决这个问题。

    升级操作方法:

    1. 在两台Couchbase服务器上下载好安装包:wget http://packages.couchbase.com/releases/2.0.1/couchbase-server-enterprise_x86_64_2.0.1.rpm

    2. 进入Coucbase管理控制台,从集群中摘掉1台服务器,具体操作方法见 couchbase-getting-started-upgrade-online

    3. 升级Couchbase至2.0.1:rpm -U couchbase-server-enterprise_x86_64_2.0.1.rpm (升级之后最好重启一下couchbase服务:service couchbase restart)

    4. 将升级后的Couchbase服务器重新加入集群。

    5. 对另一台Couchbase服务器进行同样的升级操作。

    升级后,问题解决

  • 相关阅读:
    我用纯C语言开发的中英文混合分词服务器3.0正式发布,词库190多万词,每秒切分5万+,同时提供 c、java、C#、delphi、js调用范例
    藏拙空间上线了!
    说实话我只能灌水,我谈技术你们有几个懂的啊?不信?随便发一段我写的代你们有几个能看懂的啊?
    明明三句话就能说清楚的事,专家们长篇大论,为何?
    正在开发云ERP,业务功能与天心CS ERP一模一样, 欢迎大家指正
    我的 云寻觅 搜索引擎 开始公测,前天开始开发,昨天买域名,今天发布在本机,请各路专家指正! 顺便开源!
    20071225是个值得纪念的日子,我用纯C语言开发的空间首次上线测试!
    给C# .NET 的兄弟们做点小贡献 NoSql LevelDB .net 移植版 普通PC 100万条数据插入不超过4秒
    开源:给每个文档计算一个指纹,然后用指纹进行相似度的计算 含源码和可执行程序
    国内首款完全由国人自主研发的开源云平台 BDC3.0 详解
  • 原文地址:https://www.cnblogs.com/cmt/p/3047376.html
Copyright © 2011-2022 走看看