开始优化应用层!!
目前可以看出问题如下:
- select 耗时太多!!!!
- read 系统调用的errors次数占比13% 这是一个问题
- read的次数太多,是不是可以调大接收缓存减少read 次数,同时使用zero_copy tcp : A reworked TCP zero-copy receive API
- write 次数比read 还多,应该可以使用聚合接口
- socket bind ioctl 使用了3178次,那么client server fd 为 3178*2=6356 及时使用setsockopt 设置TCP_NODELAY 、SO_REUSEADDR、
SO_RCVBUF/SO_SNDBUF 、SO_KEEPALIVE 、SO_REUSEPORT、TCP_DEFER_ACCEPT 、SO_LINGER等 但是次数不至于达到27.8w次 - epoll_ctl比较频繁 远大于3178 目前为5.2w
- recvfrom 失败出错率较高50%
- 使用close 后继续使用shuwdown 通shutdown 出错率大约为50%
- futex 有必要吗?
- restart_syscall 太耗时
当前模型:
明显存在竞争资源瓶颈
使用如下模型测试一下:
结果为:
perf stat -p 9880 sleep 10 Performance counter stats for process id '9880': 30372.082735 task-clock-msecs # 3.037 CPUs 340714 context-switches # 0.011 M/sec 288 CPU-migrations # 0.000 M/sec 150 page-faults # 0.000 M/sec 65299950534 cycles # 2149.999 M/sec (scaled from 66.24%) 12797366330 instructions # 0.196 IPC (scaled from 83.37%) 3284418549 branches # 108.139 M/sec (scaled from 83.17%) 15383662 branch-misses # 0.468 % (scaled from 83.69%) 195263517 cache-references # 6.429 M/sec (scaled from 83.42%) 29353131 cache-misses # 0.966 M/sec (scaled from 83.49%) 10.000647442 seconds time elapsed perf stat -p 9880 sleep 10 Performance counter stats for process id '9880': 30754.416664 task-clock-msecs # 3.075 CPUs 341624 context-switches # 0.011 M/sec 358 CPU-migrations # 0.000 M/sec 121 page-faults # 0.000 M/sec 66197865785 cycles # 2152.467 M/sec (scaled from 66.21%) 12878834358 instructions # 0.195 IPC (scaled from 83.57%) 3319059775 branches # 107.921 M/sec (scaled from 83.42%) 15481326 branch-misses # 0.466 % (scaled from 83.72%) 194900683 cache-references # 6.337 M/sec (scaled from 83.63%) 28394751 cache-misses # 0.923 M/sec (scaled from 83.03%) 10.001158953 seconds time elapsed
虽然解决了惊群以及多线程抢占 listen fd 问题
对比以前的stat结果:
perf stat -p 9884 sleep 10 Performance counter stats for process id '9884': 200815.127256 task-clock-msecs # 20.075 CPUs 2456764 context-switches # 0.012 M/sec 1294 CPU-migrations # 0.000 M/sec 3583 page-faults # 0.000 M/sec 430791607582 cycles # 2145.215 M/sec (scaled from 66.57%) 63233677155 instructions # 0.147 IPC (scaled from 83.19%) 18174748495 branches # 90.505 M/sec (scaled from 83.19%) 70154714 branch-misses # 0.386 % (scaled from 83.45%) 806455643 cache-references # 4.016 M/sec (scaled from 83.36%) 164527072 cache-misses # 0.819 M/sec (scaled from 83.44%) 10.003181505 seconds time elapsed
perf stat -p 9884 sleep 10 Performance counter stats for process id '9884': 203748.965387 task-clock-msecs # 20.373 CPUs 2274598 context-switches # 0.011 M/sec 1768 CPU-migrations # 0.000 M/sec 3570 page-faults # 0.000 M/sec 438541182863 cycles # 2152.360 M/sec (scaled from 66.80%) 63421130555 instructions # 0.145 IPC (scaled from 83.34%) 18428625598 branches # 90.448 M/sec (scaled from 83.23%) 69229085 branch-misses # 0.376 % (scaled from 83.41%) 770039314 cache-references # 3.779 M/sec (scaled from 83.14%) 158705951 cache-misses # 0.779 M/sec (scaled from 83.48%) 10.000827457 seconds time elapsed
使用perf stat 统计 分析:目前这种改动有一定提升;同时 cps 大约为6w
对于网络架构:目前认为可以做出如下考虑