zoukankan      html  css  js  c++  java
  • [AWS] DynamoDB: Designing Partition Keys to Distribute Your Workload Evenly

    Read: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-uniform-load.html

    https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

    DynamoDB stores and retrieves each item based on the primary key value, which must be unique. Items are distributed across 10-GB storage units, called partitions (physical storage internal to DynamoDB). Each table has one or more partitions, as shown in the following illustration.

    DynamoDB uses the partition key’s value as an input to an internal hash function. The output from the hash function determines the partition in which the item is stored. Each item’s location is determined by the hash value of its partition key.

    All items with the same partition key are stored together, and for composite partition keys, are ordered by the sort key value. DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.

    Keynote:

    • A poor PK causes "hot" partitions that result in throttling (if exceed capabilities for I/O, otherwise not).
    • More distinct PK values that your workload accesses, the MORE those requests will be spread across the partitioned space, so that you will use provisioned throughput more eficiently.

    Partition keys and request throttling

    DynamoDB evenly distributes provisioned throughput—read capacity units (RCUs) and write capacity units (WCUs)—among partitions and automatically supports your access patterns using the throughput you have provisioned. However, if your access pattern  exceeds 3000 RCU or 1000 WCU for a single partition key value, your requests might be throttled with a ProvisionedThroughputExceededException error.

    Reading or writing above the limit can be caused by these issues:

    • Uneven distribution of data due to the wrong choice of partition key
    • Frequent access of the same key in a partition (the most popular item, also known as a hot key)
    • A request rate greater than the provisioned throughput
    Partition key valueUniformity

    User ID, where the application has many users.

    Good

    Status code, where there are only a few possible status codes. Bad
    Item creation date, rounded to the nearest time period (for example, day, hour, or minute). Bad
    Device ID, where each device accesses data at relatively similar intervals. Good
    Device ID, where even if there are many devices being tracked, one is by far more popular than all the others. Bad

    Seeing last tow rows. both are DeviceID,

    • if each device are being accessed evenlly, then it is a good partition key
    • if one device is more popular than the rest, then it is a bad patition key

    How PK effect performance?

    The partition key represents the item's creation date, rounded to the nearest day. The sort key is an item identifier. On a given day, say 2014-07-09, all of the new items are written to that single partition key value (and corresponding physical partition). -- So it is not spread evenlly.

    Recommendations for partition keys

    Use high-cardinality attributes

    Using unqie id.

    Cache the popular items

    Using DAX

    Add random numbers or digits from a predetermined range for write-heavy use cases

    Suppose that you expect a large volume of writes for a partition key (for example, greater than 1000 1 K writes per second). In this case, use an additional prefix or suffix (a fixed number from predetermined range, say 1–10) and add it to the partition key.

    Following is the recommended table layout for this scenario:

    • Partition key: Add a random suffix (1–10 or 1–100) with the InvoiceNumber, depending on the number of transactions per InvoiceNumber. For example, assume that a single InvoiceNumber contains up to 50,000 1K items and that you expect 5000 writes per second. In this case, you can use the following formula to estimate the suffix range: (Number of writes per second * (roundup (item size in KB),0)* 1KB ) /1000). Using this formula requires a minimum of five partitions to distribute writes, and hence you might want to set the range as 1-5.
    • Sort key: ClientTransactionid
    • Partition Key Sort Key Attribute1
      InvoiceNumber+Randomsuffix ClientTransactionid Invoice_Date
      121212-1 Client1_trans1 2016-05-17 01.36.45
      121212-1 Client1-trans2 2016-05-18 01.36.30
      121212-2 Client2_trans1 2016-06-15 01.36.20
      121212-2 Client2_trans2 2016-07-1 01.36.15

    Because we have a random number appended to our partition key (1–5), we need to query the table five times for a given InvoiceNumber. Our partition key could be 121212-[1-5], so we need to query where partition key is 121212-1 and ClientTransactionid begins_with Client1. We need to repeat this for 121212-2, on up to 121212-5 and then merge the results.

    Using Random number as PK

    For example, consider the following schema layout of an InvoiceTransaction table. It has a header row for each invoice and contains attributes such as total amount due and transaction_country, which are unique for each invoice. Assuming we need to find the list of invoices issued for each transaction country, we can create a global secondary index with partition_key as trans_country. However, this approach leads to a hot key write scenario, because the number of invoices per country are unevenly distributed.

    Table

    Partition Key

    Table

    Sort Key

    Attribute1

    Attribute2

    GSI

    Partition_Key

    Attribute3

    GSI

    Sort Key

    Attribute4 Attribute5
    InvoiceNumber Sort_key attribute Invoice_Date Random prefix range Trans_country Amount_Due Currency
    121212 head 2018-05-17 T1 Random (1-N) USA 10000 USD
    121213 head 2018-04-1 T2 Random (1-N) USA 500000 USD
    121214 head 2018-04-1 T2 Random (1-N) FRA 500000 EUR

    Following is the global secondary index (GSI) for the preceding scenario.

    GSI

    Partition Key

    GSI

    Sort Key

    Trans_country

    Projected Attributes
    (Random range) Trans_country

    Invoice_Number

    Other Data attributes
    1-N USA 121212  
    1-N USA 121213  
    1-N FRA 121214  

    In the preceding example, you might want to identify the list of invoice numbers associated with the USA. In this case, you can issue a query to the global secondary index with partition_key = (1-N) and trans_country = USA.

    Antipatterns for partition keys

    Use sequences or unique IDs generated by the DB engine: 

     You cannot use TranscationID for any query purposes. So you lose the ability to use the partition key to perform a fast lookup of data.

    Partition key Attribute1 Attribute2
    TransactionID OrderID Order_Date
    1111111 Customer1-1 2016-05-17 01.36.45
    1111112 Customer1-2 2016-05-18 01.36.30
    1111113 Customer2-1 2016-05-18 01.36.30

    GSI support eventual consistency only,  with additional costs for reads and writes. Because normally you want to search by OrderID, generated id doesn't hold any meanings.

  • 相关阅读:
    android 打包错误
    mysql innoDB 挂了的临时解决方案
    android notification 传值关键
    maven eclipse 插件下载地址
    微信html5开发选哪一个
    android AsyncTask 只能在线程池里单个运行的问题
    关于Fragment 不响应onActivityResult的情况分析 (
    Android-BaseLine基础性开发框架
    linux网络流量实时监控工具之iptraf
    android 圆角按钮和按钮颜色
  • 原文地址:https://www.cnblogs.com/Answer1215/p/14785036.html
Copyright © 2011-2022 走看看