zoukankan      html  css  js  c++  java
  • Long-term TCP sessions & MPTCP

    https://github.com/multipath-tcp/mptcp/issues/153


    the following  issue is fixed withhttps://github.com/multipath-tcp/mptcp/commit/133537deb63d04e1dfb5af7fd82ed51ba243e518


    wapsicommentedon 20 Nov 2016

    I'm using SSH port tunneling and MPTCP and I've noticed that after several hours or days the MPTCP stops working (traffic doesn't go thru all available interfaces / gateways anymore, it is using only one path). Restarting of this long-term / sustained SSH/TCP session fixes the issue and MPTCP "starts to work again".

    What could cause this? Is there any way to debug this problem? Is there any way to tell to MPTCP to lookup new paths again or something similar? I see that under /proc//net/mptcp_net/ and /proc//net/mptcp_fullmesh there are some stats available but is there anything like echo 1 > /proc//net/mptcp_net/discover_paths_again like thing?

    My MPTCP settings:
    [ 0.412909] MPTCP: Stable release v0.91.2
    kernel.osrelease = 4.1.35.mptcp
    net.ipv4.tcp_allowed_congestion_control = lia reno cubic
    net.ipv4.tcp_available_congestion_control = lia reno balia wvegas cubic olia
    net.ipv4.tcp_congestion_control = lia (tried other ones too but the issue remains)
    net.core.wmem_max = 115343360
    net.core.rmem_max = 115343360
    net.ipv4.tcp_rmem = 10240 87380 115343360
    net.ipv4.tcp_wmem = 10240 87380 115343360
    net.mptcp.mptcp_binder_gateways =
    net.mptcp.mptcp_checksum = 0
    net.mptcp.mptcp_debug = 0
    net.mptcp.mptcp_enabled = 1
    net.mptcp.mptcp_path_manager = fullmesh
    net.mptcp.mptcp_scheduler = default
    net.mptcp.mptcp_syn_retries = 10
    net.mptcp.mptcp_version = 1

    @cpaasch
    Owner

    cpaaschcommentedon 22 Nov 2016

    Hello,

    do you have a packet-trace of this behavior? It might be that you have a NAT on the path that is timing out.

    @titer

    Hi,

    I see the same behavior here, using MPTCP to aggregate two DSL links. My local gateway is connected to both DSL routers (NAT'ed in both cases), and maintains a long-running, MPTCP-enabled OpenVPN connection to a relay router, through which traffic gets routed. I am fairly happy with that MPTCP setup btw, it has been running for a couple years now and proved effective at hiding glitches from either DSL, and aggregating bandwidth with about 90% efficiency.

    Sometimes a subflow dies, e.g. after one of the DSL routers restarts and ends up with a new public IP address. My current workaround is to run a background task that detects that and bounces OpenVPN - but if there is a better way to handle it, I am interested.

    Running Debian kernel 4.1.35.mptcp on both endpoints

    @cpaasch
    Owner

    cpaaschcommentedon 22 Nov 2016

    Can you give the below patch a try? (didn't test it at all! just compiled ;))

    You might have to tweak the sysctl's tcp_retries* to have a faster subflow timeout.
    When loading the path-manager you have to set the module-parameter create_on_err to 1. Module parameters are in/sys/module/mptcp_fullmesh/parameters

    diff --git a/include/net/mptcp.h b/include/net/mptcp.h
    index cb5e4cf76b23..e66b8aa295ca 100644
    --- a/include/net/mptcp.h
    +++ b/include/net/mptcp.h
    @@ -230,6 +230,7 @@ struct mptcp_pm_ops {
     	void (*release_sock)(struct sock *meta_sk);
     	void (*fully_established)(struct sock *meta_sk);
     	void (*new_remote_address)(struct sock *meta_sk);
    +	void (*subflow_error)(struct sock *meta_sk, struct sock *sk);
     	int  (*get_local_id)(sa_family_t family, union inet_addr *addr,
     			     struct net *net, bool *low_prio);
     	void (*addr_signal)(struct sock *sk, unsigned *size,
    diff --git a/net/mptcp/mptcp_ctrl.c b/net/mptcp/mptcp_ctrl.c
    index 6045ba160225..853310cbc5d9 100644
    --- a/net/mptcp/mptcp_ctrl.c
    +++ b/net/mptcp/mptcp_ctrl.c
    @@ -610,13 +610,13 @@ EXPORT_SYMBOL(mptcp_select_ack_sock);
     static void mptcp_sock_def_error_report(struct sock *sk)
     {
     	const struct mptcp_cb *mpcb = tcp_sk(sk)->mpcb;
    +	struct sock *meta_sk = mptcp_meta_sk(sk);
     
     	if (!sock_flag(sk, SOCK_DEAD))
     		mptcp_sub_close(sk, 0);
     
     	if (mpcb->infinite_mapping_rcv || mpcb->infinite_mapping_snd ||
     	    mpcb->send_infinite_mapping) {
    -		struct sock *meta_sk = mptcp_meta_sk(sk);
     
     		meta_sk->sk_err = sk->sk_err;
     		meta_sk->sk_err_soft = sk->sk_err_soft;
    @@ -633,6 +633,9 @@ static void mptcp_sock_def_error_report(struct sock *sk)
     			tcp_done(meta_sk);
     	}
     
    +	if (mpcb->pm_ops->subflow_error)
    +		mpcb->pm_ops->subflow_error(meta_sk, sk);
    +
     	sk->sk_err = 0;
     	return;
     }
    diff --git a/net/mptcp/mptcp_fullmesh.c b/net/mptcp/mptcp_fullmesh.c
    index 71eb2d4ad2d4..61fda6e1be3e 100644
    --- a/net/mptcp/mptcp_fullmesh.c
    +++ b/net/mptcp/mptcp_fullmesh.c
    @@ -95,6 +95,10 @@ static int num_subflows __read_mostly = 1;
     module_param(num_subflows, int, 0644);
     MODULE_PARM_DESC(num_subflows, "choose the number of subflows per pair of IP addresses of MPTCP connection");
     
    +static int create_on_err __read_mostly = 0;
    +module_param(create_on_err, int, 0644);
    +MODULE_PARM_DESC(create_on_err, "recreate the subflow upon a timeout");
    +
     static struct mptcp_pm_ops full_mesh __read_mostly;
     
     static void full_mesh_create_subflows(struct sock *meta_sk);
    @@ -1370,6 +1374,24 @@ static void full_mesh_create_subflows(struct sock *meta_sk)
     	}
     }
     
    +static void full_mesh_subflow_error(struct sock *meta_sk, struct sock *sk)
    +{
    +	const struct mptcp_cb *mpcb = tcp_sk(meta_sk)->mpcb;
    +
    +	if (!create_on_err)
    +		return;
    +
    +	if (mpcb->infinite_mapping_snd || mpcb->infinite_mapping_rcv ||
    +	    mpcb->send_infinite_mapping ||
    +	    mpcb->server_side || sock_flag(meta_sk, SOCK_DEAD))
    +		return;
    +
    +	if (sk->sk_err != ETIMEDOUT)
    +		return;
    +
    +	full_mesh_create_subflows(meta_sk);
    +}
    +
     /* Called upon release_sock, if the socket was owned by the user during
      * a path-management event.
      */
    @@ -1799,6 +1821,7 @@ static struct mptcp_pm_ops full_mesh __read_mostly = {
     	.release_sock = full_mesh_release_sock,
     	.fully_established = full_mesh_create_subflows,
     	.new_remote_address = full_mesh_create_subflows,
    +	.subflow_error = full_mesh_subflow_error,
     	.get_local_id = full_mesh_get_local_id,
     	.addr_signal = full_mesh_addr_signal,
     	.add_raddr = full_mesh_add_raddr,
    
    @wapsi

    wapsicommentedon 22 Nov 2016

    Hmmm... Packet trace is quite difficult to take from because it could take 1 hour or 2 days when this happens. The capture file will be HUGE...

    Yes, I've NAT between these MPTCP boxes. Here are the NAT TCP timeout settings (it's a Linux box):

    [root@firewall ~]# sysctl -a|grep conntrack_tcp_timeout
    net.netfilter.nf_conntrack_tcp_timeout_close = 10
    net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
    net.netfilter.nf_conntrack_tcp_timeout_established = **432000**
    net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
    net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
    net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
    net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
    net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
    net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
    net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
    

    And I've opened SSH tunnel using settings ServerAliveInterval 10 & ServerAliveCountMax 3 at the client side and ClientAliveInterval 10, ClientAliveCountMax 3 & TCPKeepAlive yes settings at the server side so if I understand those settings correct they should avoid TCP timeout issues.

    Here are some stats from netstat commands (I've 3 gateways and 3 sublows an I exclude ^mptcp connections from this list because I want to list subflows):

    $ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth0 IP):"
    9
    $ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth1 IP):"
    9
    $ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth2 IP):"
    9
    

    And after several hours those are something like:

    $ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth0 IP):"
    7
    $ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth1 IP):"
    5
    $ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth2 IP):"
    4
    

    So some of the subflows have really dropped out. Now if I restart SSH sessions all the TCP subflows get established again. Another approach is run following commands:

    $ ip link set dev eth0 multipath off ; sleep 1 ; ip link set dev eth0 multipath on
    $ ip link set dev eth1 multipath off ; sleep 1 ; ip link set dev eth1 multipath on
    $ ip link set dev eth2 multipath off ; sleep 1 ; ip link set dev eth2 multipath on
    

    And then new subflows are established again using all available gateways.

    Update: I'll try with the patch you just sent.

    @cpaasch
    Owner

    cpaaschcommentedon 22 Nov 2016

    The keepalives unfortunately are not a safe solution (in today's implementation of MPTCP in Linux). Because, we chose to only keep the MPTCP-connection alive. Meaning TCP keepalives are sent at most on one single subflow. Thus, the other subflow is timing out.

    The keepalive handling is probably something we should rethink.

    @wapsi

    Just tested with your patch applied and create_on_err parameter set:

    $ cat /sys/module/mptcp_fullmesh/parameters/create_on_err
    1
    

    sysctl mptcp settings used:

    net.mptcp.mptcp_binder_gateways = 
    net.mptcp.mptcp_checksum = 0
    net.mptcp.mptcp_debug = 0
    net.mptcp.mptcp_enabled = 1
    net.mptcp.mptcp_path_manager = fullmesh
    net.mptcp.mptcp_scheduler = default
    net.mptcp.mptcp_syn_retries = 10
    net.mptcp.mptcp_version = 1
    

    and still some TCP subflows will be disconnected (after ~5 hours). Again if I run:

    $ ip link set dev eth0 multipath off ; sleep 1 ; ip link set dev eth0 multipath on
    $ ip link set dev eth1 multipath off ; sleep 1 ; ip link set dev eth1 multipath on
    $ ip link set dev eth2 multipath off ; sleep 1 ; ip link set dev eth2 multipath on
    

    the situation will be fixed and new subflows will be opened using all available gateways.

    I used your patch only at the client's side. I assumed that it is necessary only on there.

    @cpaasch
    Owner

    cpaaschcommentedon 23 Nov 2016

    Yes, it's only needed on the client's side.

    You should also change the tcp-retries sysctl's to have faster timeouts:

    sysctl -w net.ipv4.tcp_retries2=3
    

    Please also take a packet-trace to see if you really get timeout.

    @cpaasch
    Owner

    cpaaschcommentedon 29 Nov 2016

    @wapsi &@titer - Do you have an update?

    @cpaaschcpaasch addedenhancement and removedquestion labelson 29 Nov 2016

    @wapsi

    wapsicommentedon 29 Nov 2016

    I'm not able to get valid packet-trace atm. If I try to do this with tcpdump the cap file gets so huge before first subflow drops that it doesn't fit in my MPTCP router's HDD. Do you have any tips how to do this "sensibly"?

    @cpaasch
    Owner

    cpaaschcommentedon 2 Dec 2016

    @wapsi You can limit the size of the packet-trace with the option-s 150. Then, if that's not enough, add-C 100 -W 10 -w capture. This limits the file-size to 100MB and overwrites the original when rotating.

    @djbobo

    djbobocommentedon 18 Dec 2016

    Hi,
    I'm seeing the same behavior on long running tcp connections.

    Server with two interfaces - public (routable) ipv4
    Client with two interfaces - masqueraded ipv4

    Initial connection originates from the client (behind nat), full mesh is established as expected, in this case 2x2. However after some time one drops and it remains in 3 sub-flows.

    Using OpenVPN (tcp) instead of ssh.

    Using your debian kernel build https://dl.bintray.com/cpaasch/deb
    Building kernel and will get back.

    @cpaasch
    Owner

    cpaaschcommentedon 19 Dec 2016

    It would be good if someone can test the patch and confirm whether it really solves the problem.

    @djbobo

    I'm running the patched kernel for 9 hours.
    I'd wait a little bit longer before I confirm.

    Everything looks good so far.

    matttbe added a commit that referenced this issueon 1 Feb

    @cpaasch
    Owner

    cpaaschcommentedon 10 Feb

    Fixed with 133537d

    @cpaaschcpaasch closed thison 10 Feb


  • 相关阅读:
    Python代写使用矩阵分解法找到类似的音乐
    (转)语义分析透镜洞察婚庆微博用户行为数据
    (转)虎扑论坛基因探秘:社群用户行为数据洞察
    bzoj 4922: [Lydsy1706月赛]Karp-de-Chant Number 贪心+dp
    bzoj 1110: [POI2007]砝码Odw 贪心
    bzoj 3721: PA2014 Final Bazarek 贪心
    bzoj 2563: 阿狸和桃子的游戏 贪心
    bzoj 3999: [TJOI2015]旅游 LCT
    bzoj 4240: 有趣的家庭菜园 树状数组+贪心
    CF369E Valera and Queries kdtree
  • 原文地址:https://www.cnblogs.com/ztguang/p/12644495.html
Copyright © 2011-2022 走看看