zoukankan      html  css  js  c++  java
  • 20130427遇到的2个问题:503错误与Couchbase集群CPU占用不均衡

    (注:这2个问题与阿里云一点关系没有)

    一、503错误

    今天13:00~13:10左右,出现了503错误。出错原因是当时的并发请求数超出了IIS应用程序池的队列长度(Queue Length),当时用的是IIS的默认设置1000(见下图)。

    我们将这里的Queue Length由1000改为2000解决了问题(最大可以设置为65535)。

    后来发现可以通过 Performance Monitor 监测 "HTTP Service Request queue" -> "Arrival Rate" 来设定 Queue Length。

    比如上图中显示"Arrival Rate"的最大值是400,那么Queue Length最好大于400。

    看一下当时的负载均衡中一台Web服务器的CPU监控图:

    (红色曲线表示%Processor Time,绿色曲线表示Request Execution Time)

    不知当时这台云服务器发生了什么异常情况?看来503错误的根源是云服务器的CPU异常,已向阿里云提交工单了解情况。

    更新:

    经过仔细排查,503错误是当时应用程序池崩溃引起的,应用程序池崩溃是Couchbase客户端引起的,当时正在进行Couchbase集群增/减服务器的操作。

    证据来自Windows事件日志:

    Exception: System.NullReferenceException
    Message: Object reference not set to an instance of an object.
    StackTrace:    at Hammock.RestClient.CompleteWithQuery(WebQuery query, RestRequest request, RestCallback callback, WebQueryAsyncResult result)
       at Hammock.RestClient.<>c__DisplayClass18.<BeginRequestImpl>b__15(Object sender, WebQueryResponseEventArgs args)
       at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
       at Hammock.Web.WebQuery.OnQueryResponse(WebQueryResponseEventArgs args)
       at Hammock.Web.WebQuery.HandleWebException(WebException exception)
       at Hammock.Web.WebQuery.GetAsyncResponseCallback(IAsyncResult asyncResult)
       at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
       at System.Threading.ExecutionContext.runTryCode(Object userData)
       at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Net.ContextAwareResult.Complete(IntPtr userToken)
       at System.Net.HttpWebRequest.SetResponse(Exception E)
       at System.Net.ConnectionReturnResult.SetResponses(ConnectionReturnResult returnResult)
       at System.Net.Connection.CompleteConnectionWrapper(Object request, Object state)
       at System.Net.PooledStream.ConnectionCallback(Object owningObject, Exception e, Socket socket, IPAddress address)
       at System.Net.ServicePoint.ConnectSocketCallback(IAsyncResult asyncResult)
       at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
       at System.Net.ContextAwareResult.Complete(IntPtr userToken)
       at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
       at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)
    Application: w3wp.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: System.NullReferenceException
    Stack:
       at System.Net.ServicePoint.ConnectSocketCallback(System.IAsyncResult)
       at System.Net.LazyAsyncResult.Complete(IntPtr)
       at System.Net.ContextAwareResult.Complete(IntPtr)
       at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
       at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
    Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
    Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
    Exception code: 0xc0000005
    Fault offset: 0x000007ff0033cbed
    Faulting process id: 0x10b4
    Faulting application start time: 0x01ce42fb6c5d3e18
    Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
    Faulting module path: unknown
    Report Id: 30767fd7-aef7-11e2-8bf7-e5d3e0390d57

    2.  Couchbase集群CPU占用不均衡

    (Couchbase管理控制台)

    (Linux top命令运行结果)

    两台Couchbase组建的集群,CPU占用却相差很大,Couchbase版本是2.0.0。

    google之后找到High cpu usage in memcached process,原来是Couchbase 2.0.0的bug,升级至最新版Couchbase 2.0.1可以解决这个问题。

    升级操作方法:

    1. 在两台Couchbase服务器上下载好安装包:wget http://packages.couchbase.com/releases/2.0.1/couchbase-server-enterprise_x86_64_2.0.1.rpm

    2. 进入Coucbase管理控制台,从集群中摘掉1台服务器,具体操作方法见 couchbase-getting-started-upgrade-online

    3. 升级Couchbase至2.0.1:rpm -U couchbase-server-enterprise_x86_64_2.0.1.rpm (升级之后最好重启一下couchbase服务:service couchbase restart)

    4. 将升级后的Couchbase服务器重新加入集群。

    5. 对另一台Couchbase服务器进行同样的升级操作。

    升级后,问题解决

  • 相关阅读:
    RecyclerView-------MainActivity代码
    ListView控件
    Java 内部类的阐述
    JAVA匿名内部类
    Process.Start(@"C:WindowsSystem32osk.exe") 找不到指定文件
    The program can't start because AppVIsvSubsystems64.dll is missing from your computer
    Rclone webapi 使用例子
    C# lock 死锁问题排查方法
    zeroc ICE 使用案例
    log4net 纯代码配置
  • 原文地址:https://www.cnblogs.com/cmt/p/3047376.html
Copyright © 2011-2022 走看看