维护窗口和停机时间 可用率99.99%
QQ的可用率是 99.99%
http://www.sqlskills.com/blogs/paul/survey-target-uptime-planned-actual/
t’s been five years(!) since the last time I asked about your target uptimes for your critical SQL Server instances and I think we’d all be interested to see how things have changed.
Edit 6/2/14: The survey is closed now – see here for the results.
So I present four surveys to you. For your most critical SQL Server instance:
If it’s a 24×7 system, what’s the target uptime?
If it’s a 24×7 system, what’s your measured uptime over the last year?
If it’s not a 24×7 system, what’s the target uptime?
If it’s a 24×7 system, what’s your measured uptime over the last year?
You’ll notice that the surveys are termed in percentages. Here’s what the percentages mean for a 24×7 system:
99.999% = 5.26 minutes of downtime per year
99.99% = 52.56 minutes of downtime per year
99.9% = 8.76 hours of downtime per year
99.5% = 1.825 days of downtime per year
99% = 3.65 days of downtime per year
98.5% = 5.475 days of downtime per year
98% = 7.3 days of downtime per year
95% = 18.25 days of downtime per year
If your target uptime allows for planned maintenance downtime, then that doesn’t count as unplanned downtime, as long as your system was only down for the length of time allowed. But don’t cheat yourself and retroactively classify unplanned downtime as planned, so it doesn’t affect your actual, measured uptime.
For instance, if you have a 99.9% uptime goal for a 24×7 system, with a quarterly 4-hour maintenance window, then I would select 99.9% in the 24×7 target survey. For that same system, if the downtime was limited to the proscribed 4-hour window each quarter, and there was no other downtime *at all*, I would select 99.999% on the 24×7 measured uptime survey.
Basic advice is to use common sense in how you answer. If you say you have a 24×7 system but you have a 12-hour maintenance window each week, I wouldn’t classify that as a 24×7 system.
24×7 Systems
Survey 1: 24×7 system target uptime
Survey 2: 24×7 system measured uptime
Please be honest. Remember if you choose 99.999% that means you’re saying your system was up for all but 5 minutes in the last year.
Non-24×7 Systems
Survey 3: Non-24×7 system target uptime
Use ‘Other’ to answer if your answer is ‘No target or target unknown’.
Survey 4: Non-24×7 system measured uptime
Please be honest.
I’ll editorialize the results in a week or two.
Thanks!
Posted in: Surveys
http://www.sqlskills.com/blogs/paul/target-actual-uptime-survey-results/
Exactly five years ago I published survey results showing target uptime SLAs and actual uptime measurements. I re-ran the survey a few weeks ago to see what’s changed, if anything, in the space of five years, and here are the results.
24×7 Systems
Other responses:
1 x 99.95%
Non 24×7 Systems
Other responses:
7 x “No target or target unknown”
1 x “0830 – 1730 M-Sat”
Other values:
1 x “n/a”
Summary
Well, the good thing is that this survey had almost twice the number of respondents as the 2009 survey, but that could just be that a lot more people read my blog now than five years ago.
My takeaway from the data is that nothing has really changed over the last five years. Given the really low response rate to the survey (when I usually get more than 2-300 responses for a typical survey), my inference is that the majority of you out there don’t have well-defined uptime targets (or recovery time objective service level agreements, RTO SLAs, or whatever you want to call it) and so didn’t respond to the survey. The same thing happens when surveying something like backup testing frequency – where you *know* you’re supposed to do it, but don’t do it enough so feel guilty and don’t respond to the survey.
For those of you that responded, or didn’t respond and do have targets, well done! For those of you that don’t have targets, I don’t blame you, I blame the environment you’re in. Most DBAs I know that *want* to do something about HA/DR are prevented from doing so by their management not placing enough importance on the subject, from talking to a bunch of you. This is also shown by the demand for our various in-person training classes: IE2 on Performance Tuning is usually over-subscribed even though it runs 3-4 times per year, but IE3 on HA/DR has only sold out once even though we generally run it only once per year.
Performance is the number one thing on the collective minds of most I.T. management, not HA/DR planning, and that’s just wrong. Business continuity is so crucial, especially in this day and age of close competition where being down can cause fickle customers to move to a different store/service provider.
If you’re reading this and you know you don’t have well-defined uptime targets then I strongly encourage you to raise the issue with your management, as it’s likely that your entire HA/DR strategy is lacking too. For more information, you can read the results post from the survey five years ago (Importance of defining and measuring SLAs).
Don’t wait until disaster strikes to make HA/DR a priority.