system design(how to design tweet)

zoukankan html css js c++ java

system design(how to design tweet)
Catalog
- Clarify the requirements
- Capacity Estimation
- System APIs
- High-level System Design
- Data Storage
- Scalability
Step1: Clarify the requirements

Clarify requirements and goals of the system
- Requirements
- Traffic size(e.g. Daily Active User)
Nobody expect you do design a complete system in 30-40 mins

Discuss the functionalities, align with interviewers or components to focus

Type1: Functional Requirement
1. Tweet
  
  a. Create
  
  b. Delete
2. Timeline/Feed
  
  a. Home
  
  b. User
3. Follow a user
4. Like a tweet
5. Search tweets
  ...
Type2: Non-Functional Requirement
- Consistency
  
  Every read receives the most recent write or an error
  
  Sacrifice: Eventual consistency
- Availability
  
  Every request receives a response, without the guarantee that it contains the most recent write
  
  Scalable
  
  Performance: low latency
- Partion tolerance(Fault Tolerance)
  
  The system continues to operate despite an arbitrary number of messages being dropped by the network between nodes
Step2: Capacity Estimation

Assumption:
- 200 million DAU, 100 million new tweets
- Each user: visit home timeline 5 times; other user timeline 3 times
- Each timeline/page has 20 tweets
- Each tweet has size 280 bytes, matadatda 30 bytes
- per photo: 200kb, 20% tweets have images
- per video: 2mb, 10% tweets have video, 30% videos will be watched

Storage Estimate
- Write size daily:
  
  Text：
  
  100M new tweets*(280+30)bytes/tweet = 31GB/day
  
  Image:
  
  4TB/day
  
  Video:
  
  20TB/day
- Total
  
  24TB/day
Bandwidth Estimate (Social Networking => read heavy)

Daily Read Tweets Volume:
- 200M * (5 home visit + 3 user visit) * 20 tweets/page = 32B tweets/day
Daily Read Band
- Text: 23B * 280bytes / 86400 = 100MB/s
- Image: 14GB/s
- Video: 20GB/s
- Total: 35GB/s
Step3: System APIs
```
postTweet(userToken, string tweet)

deleteTweet(userToken, string tweetId)

likeOrUnlikeTweet(userToken, string tweetId, bool like)

readHomeTimeLine(userToken, int pageSize, opt string pageToken)

readUserTimeLine(userToken, int pageSize, opt string pageToken)
```
Step4: High-Level System Design:
- post tweets
- user timeline(push/pull mode)
https://medium.com/@winapp/read-fast-with-fan-out-write-f25257117297

Home Timeline (cant d)

Fan out on write
- Not efficient for users with huge amount of followers(like Taylor Swift)
Hybrid Solution
- Non-hot users:
  
  fan out on write(push)
- Hot users:
  
  fan in on write(pull): read during timeline request from tweets cache, and aggregate with results from non-hot users
Step5: Data Storage

principles
- SQL database:
  
  e.g, user table
- NoSQL database:
  
  e.g, timelines
- File system:
  
  media file: image, audio, video
Step6: Scalability
- Identify potential bottlenecks
- Discussion solutions, focusing on tradeoffs
  
  Data sharding
  
  data store, cache
  
  Load balancing
  
  user <-> application server
  
  application server <-> cache server
  
  application server <-> db
  
  Data caching
  
  read heavy
Sharding

Why?
- impossible to store/process all data in a single machine
How?
- Break large tables into smaller shards on multiple servers
Pros
- Horizontal scaling
Cons
- Complexity(distributed query, resharding...)
Option 1: shard by tweets' creation time

Pros:
- Limited shards to query
Cons:
- Hot/Cold data issue
- New shards fill up quickly
Option 2: Shard by hash(userId): store all the data of user on a single shard

Pros:
- Simple
- Query user timeline is straightforward
Cons:
- Home timeline stall needs to query multiple shards
- Non-uniform distribution of storage
- Hot users
- Availability
Option 3: Shard by hash(tweetId)

Pros:
- uniform distribution
- high availability
Cons:
- need to query all shards in order to generate user/home timeline（cache solution）
Caching

Why?
- social networks have heavy read traffic
- queries can be slow and cosyly
How?
- store hot/ precompuyed data in memory, reads can much faster
Timeline service
- user timelinme: user_id -> {tweet_id}
- home timeline: user_id -> {tweet_id}
- tweets: tweet_id -> tweet
Topics:
- caching policy
- sharding
- performance
ref

https://www.youtube.com/watch?v=PMCdWr6ejpw&list=PLLuMmzMTgVK4RuSJjXUxjeUt3-vSyA1Or&index=1
一个没有高级趣味的人。 email：hushui502@gmail.com
查看全文

相关阅读:
MINA简单的介绍
 java classloader详解
 nginx 和 tomcat 组合搭建后端负载均衡
 nginx主要配置
 Mysql知识汇总笔记
 gradle 构建java工程
 决策树
 如何使用hadoop RPC机制
 PowerPoint插入公式快捷键
 C++基础

原文地址：https://www.cnblogs.com/CherryTab/p/15102605.html

system design(how to design tweet)

Catalog

Step1: Clarify the requirements

Type1: Functional Requirement

Type2: Non-Functional Requirement

Step2: Capacity Estimation

Storage Estimate

Bandwidth Estimate (Social Networking => read heavy)

Step3: System APIs

Step4: High-Level System Design:

Home Timeline (cant d)

Step5: Data Storage

principles

Step6: Scalability

Sharding

Option 1: shard by tweets' creation time

Option 2: Shard by hash(userId): store all the data of user on a single shard

Option 3: Shard by hash(tweetId)

Caching

ref