zoukankan      html  css  js  c++  java
  • Notes on <High Performance MySQL> Ch9: Scaling and High Availability

    Chapter 9: Scaling and High Availability

    Scalability is the ability to add capacity as needed without reducing performance.

    Scaling MySQL

    Planning for Scalability

    Buying Time Before Scaling

    Optimize Performance:

                    If you’re facing performance limitations now, one of the first things you should do is enable and analyze the slow query log, and then see which queries you can optimize.

    Buy more powerful hardware

                   

    Scaling Up

    MySQL doesn’t tend to scale vertically very well, because it’s hard to get it to use many CPUs and disks effectively.

    Even if your master server can use many CPUs effectively, there’s little chance that you’ll be able to build a slave server that’s powerful enough to keep up. A heavily loaded master can easily do more work than a slave server with the same hardware can handle, because the slave’s replication thread can’t use multiple CPUs and disks efficiently.

    Single-server applications usually run into read limits first, especially if they run complicated read queries. Such queries are single-threaded inside MySQL, so they’ll use only one CPU.

    Scaling Out

    The simplest and most common way to scale out is to distribute your data across several servers with replication, and then use the slaves for read queries.

    The other common way to scale out is to partition your workload across multiple “nodes”. A node is the functional unit in your MySQL architecture.

    Functional partitioning

    Functional partitioning, or division of duties, means dedicating different nodes to different tasks.

    Functional partitioning usually dedicates individual servers or nodes to different applications, so each contains only the data its particular application needs.

    Another possible functional partitioning approach is to split a single application’s data by determining sets of tables you never join to each other.

    You still can’t scale functional partitioning indefinitely, because each functional area must scale vertically if it is tied to a single MySQL node.

    Data sharding

    Data sharding is the most common and successful approach for scaling today’s very large MySQL applications. You shard the data by splitting it into smaller pieces, or shards, and storing them on different nodes.

    Sharding works well when combined with some type of functional partitioning. Most sharded systems also have some “global” data that isn’t sharded at all. This global data is usually stored on a single node, often behind a cache such as memcached.

    Sharded applications often have a database abstraction library that eases the communication between the application and the sharded data store.

    A sharded data store may feel like an elegant solution, but it’s hard to build.

    Choosing a partitioning key

    The most important challenge with sharding is finding and retrieving data. How you find data depends on how you shard it.

    The goal is to make our most important and frequent queries touch as few shard as possible. The most important part of that process is choosing a partitioning key (or keys) for your data.

    Multiple partitioning keys.

    Querying across shards

    Querying across shards isn’t the only thing that’s harder with sharding. Maintaining data consistency is also difficult. Foreign keys won’t work across shards, so the normal solution is to check as needed in the application.

    Allocating data, shards, and nodes

    Shards and nodes don’t have to have a one-to-one relationship. It’s often a good idea to make a shard’s size much smaller than a node’s capacity, so you can store multiple shards on a single node.

    Fixed allocation

    There are two main ways to allocate data to the shards: the fixed and dynamic allocation strategies. Both require a partitioning function that takes a row’s partitioning key as input and returns the shard that holds the row.

    Fixed allocation uses a partitioning function that depends only on the partitioning key’s value.

    Dynamic allocation

    The alternative to fixed allocation is dynamic allocation that you store separately, as a per-unit-of-sharding mapping.

    Explicit allocation

    A third allocation strategy is to let the application choose each row’s desired shard explicitly when it creates the row.

    Rebalancing shards

    Generating globally unique IDs

    • Use auto_increment_increment and auto_increment_offset
    • Create a table in the global node
    • Use memcached

    There’s an incr() function in the memcached API that can increment a number atomically and return the result

    • Allocate numbers in batches
    • Use a combination of values
    • Use dual-column AUTO_INCREMENT keys (only works for MyISAM tables)
    • Use GUID values

    Tools for sharding

    One database abstraction layer with sharding support that exists already is Hiberate Shards. Hibernate Shards is a good way to store and retrieve data across many servers transparently, but it doesn’t provide some features, such as rebalancing shards and aggregating query results. It uses a fixed allocation strategy to allocate data to the shards.

    Another sharding system is HiveDB, an open source framework for sharding MySQL that tries to implement sharding’s core ideas in a clear and concise way.

    Scaling Back

    One of the simpler ways to deal with an increasing data size and workload is to archive and purge unneeded data.

    Keeping active data separate

    Splitting tables into parts

                   

    MySQL partitioning

    Time-based data partitioning

    Scaling by Clustering

    Clustering

    MySQL’s NDB Cluster storage engine is a distributed, in-memory, shared-nothing storage engine with synchronous replication and automatic data partitioning across the nodes.

    Federation

    In the database world, Federation generally means accessing one server’s data from another server.

    Load Balancing

    Load balancing has 5 common goals:

    -          Scalability

    -          Efficiency

    -          Availability

    -          Transparency

    -          Consistency

    Connecting Directly

    Splitting reads and writes in replication

    MySQL replication gives you multiple copies of your data and lets you choose whether to run a query on the master or a slave.

    The primary difficulty is how to handle stale data on the slave.

    Some of the most common methods of splitting reads and writes are as follows:

    -          Query-based split: The simplest split is to direct all writes and any reads that can never tolerate stale data to the active or master server

    -          Stale-data split:

    -          Session-based split: whether the user has changed any data. The user doesn’t have to see the most up-to-date data from other users but should see his or her own changes. You can implement this at the session level by flagging the session has made a change and directing the user’s read queries to the master for a certain period of time after that.

    -          Version-based split:

    -          Global version/session split

    -           

    Changing the application configuration

    Changing DNS names

    Moving IP addresses

    Introducing a Middleman

    Load balances

    Load-balancing algorithms

    -          Random

    -          Round-robin

    -          Least connections

    -          Fastest response

    -          Hashed

    -          Weighted

    -           

    Adding and removing servers in the pool

    You should configure the servers in the connection pool so that there is enough unused capacity to let you  take servers out for maintenance, or to handle the load when servers fail.

    Load Balancing with a Master and Multiple Slaves

    -          Functional partitioning

    -          Filtering and data partitioning

    -          Moving parts of writes to a slave

    -          Guaranteeing a slave is caught up

    -          Writes synchronization

    High Availability

    Planning for High Availability

    The most important principle of high availability is to find and eliminate single points of failure in your system. Any point in your system that isn’t redundant is a single point of failure.

    Adding Redundancy

    Shared-storage architectures

    Shared storage is a way to remove some single point of failure, usually with a SAN.

    Replicated-disk architectures

    The disk replication most commonly used for MySQL is DRBD, in combination with the tools from the Linux-HA project.

    DRBD is synchronous, block-level replication implemented as a Linux kernel module. It copies every block from a primary device over a network card to another server’s block device, and writes it there before committing the block on the primary device.

    Our favorite way to use DRBD is to replicate only the device that holds the binary logs. If the active node fails, you can start a log server on the passive node and use the recovered binary logs to bring all of the failed master’s slaves up to the latest binary log position.

    Synchronous MySQL replication

    Failover and Failback

    Failover is the process of removing a failed server and using another server instead. The most important part of failover is failback. If you can’t switch back and forth between servers at will, failover is a dead end and only postpones downtime.

     

    Promoting a slave or switching roles

    Virtual IP addresses or IP takeover

    You can achieve high availability by assigning a logical IP address to a MySQL instance that you expect to perform certain services. If the MySQL instance fails, you can move the IP address to a different  MySQL server.

    Floating IP addresses and IP takeover work well for failover between machines that are local to each other --  that is, on the same subnet.

    The MySQL Master-Master Replication Manager

    Middleman solutions

    You can use proxies, port forwarding, network address translation(NAT), and hardware load balancers for failover and failback.

    Handling failover in the application

  • 相关阅读:
    jquery的 $.Event()
    自动化构建种常用命令
    原生js实现addClass,removeClass,hasClass方法
    43.放苹果(递归练习)
    43.放苹果(递归练习)
    43.放苹果(递归练习)
    43.放苹果(递归练习)
    42.递归算法---数的划分
    42.递归算法---数的划分
    42.递归算法---数的划分
  • 原文地址:https://www.cnblogs.com/fangwenyu/p/2669421.html
Copyright © 2011-2022 走看看