算法服务布署在k8s上
服务走了两层代理,出问题其实并不觉得意外,因为代理层数过多
kong -> nginx-ingress -> svc
通过kong访问,部分请求返回502
<html><head><title>502 Bad Gateway</title></head><body><center><h1>502 Bad Gateway</h1></center><hr><center>nginx/1.19.1</center></body></html>
查看kong的访问,发现warn
2021/01/25 16:46:05 [warn] 166110#0: *3522559308 a client request body is buffered to a temporary file /usr/local/kong/client_body_temp/0000015840, client: 192.168.11.111, server: kong, request: "POST /slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment HTTP/1.1", host: "cclient.github.com"
2021/01/25 16:46:06 [warn] 166118#0: *3522560171 a client request body is buffered to a temporary file /usr/local/kong/client_body_temp/0000015841, client: 192.168.11.111, server: kong, request: "POST /slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment HTTP/1.1", host: "cclient.github.com"
2021/01/25 16:46:06 [warn] 166117#0: *3522561394 a client request body is buffered to a temporary file /usr/local/kong/client_body_temp/0000015842, client: 192.168.11.111, server: kong, request: "POST /slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment HTTP/1.1", host: "cclient.github.com"
调大client_body_buffer_size
后报警不再出现,但依然有502请求
经测试-直接通过nodeport访问,无502
通过nginx-ingress,部分502
能过kong,部分502
先一级一级优化排查吧
原始的ingress信息
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
generation: 1
name: statefulset-123456
namespace: default
spec:
rules:
- host: cclient.github.com
http:
paths:
- backend:
serviceName: statefulset-123456
servicePort: 80
path: /api_v1/123456(/|$)(.*)
服务本身,nodeport访问,返回header
curl -D header_nodeport -H "Content-Type: application/json" -X POST -d '["我感觉就是翻新机。","苏宁的电脑就是比便宜好几百"]' "http://192.168.100.128:32359/sentiment"
HTTP/1.1 200 OK
Content-Length: 519
Content-Type: application/json
Connection: keep-alive
Keep-Alive: 5
服务通过nginx-ingress访问,返回header
curl -D header_nodeport -H "Content-Type: application/json" -X POST -d '["我感觉就是翻新机。","苏宁的电脑就是比便宜好几百"]' "http://matrix-paas.mlamp.cn/api_v1/123456/sentiment
curl -D header_nodeport -H "Content-Type: application/json" -X POST -d '["我感觉就是翻新机。","苏宁的电脑就是比便宜好几百"]' "http://matrix-paas.mlamp.cn/slap/algo-api/3d1503b8ee0c4b8082d13a0d8f2f3173/general_sentiment"
HTTP/1.1 200 OK
Server: nginx/1.19.1
Date: Wed, 27 Jan 2021 10:36:37 GMT
Content-Type: application/json
Content-Length: 519
Connection: keep-alive
Vary: Accept-Encoding
服务通过外层的kong访问返回header
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 519
Connection: keep-alive
Server: nginx/1.19.1
Date: Wed, 27 Jan 2021 10:36:01 GMT
Vary: Accept-Encoding
X-Kong-Upstream-Latency: 92
X-Kong-Proxy-Latency: 16
Via: kong/2.0.4
先保证nginx-ingress不返回502再说,加了很多和nginx对应的time-out,buffer参数,但都不生效,依然大量502
nginx.ingress.kubernetes.io/client-body-buffer-size: 100m
nginx.ingress.kubernetes.io/proxy-body-size: 100m
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-next-upstream-timeout: "600"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "8"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
以前写过一篇也是http1.0,http1.1的问题记录文章,按个人经验试了下http版本
nginx 配合jersey+netty的奇怪问题 - 资本主义接班人 - 博客园 (cnblogs.com)
尝试性的加了一条
nginx.ingress.kubernetes.io/proxy-http-version: "1.0"
502便不再出现了,问题解决
只是还有稍许疑问,因为后端服务又确实是http1.1,可能是nginx-ingress的bug,暂时保证可以正常访问不深究(老集群,nginx-ingress现在也早已不推荐使用)
外层的kong因为集成度过高,无法为特定请求配置1.0,全局1.0的话,又会影响其他服务,暂时放下