THIS POST REPRESENTS ONLY PERSONAL OPINIONS.
introduction
IPIP (IP in IP) is one of common network traffic encapsulation methods to isolate the private subnet outside of the network infrastructure,It wraps the original IP header and its payload, like TCP, with another outsider IP header, it carries the lowest overhead, minimum length of an IP header 20 bytes, compared with other tunnel technology. The drawback of IPIP is that only unicast is supported, so for broadcast like OSPF, RIP, etc. Also, IPIP tunnel traffic is not allowed on Cloud Providers, like Microsoft Azure.
ipip module
ipip module relies on tunnel4 and ip_tunnel module.
core@ip-10-0-7-77 ~ $ lsmod |grep ipip
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 24576 1 ipip
ipip module initializes the fallback tunnel interface, tunl0, and provides the
tunnel4 module provides an abstract layer to handle IPIP protocol and calls the handler, like ipip, deal with the ipip packages.
And for ip_tunnel, it provides a generic and core interface for IPIP, and GRE to find the matching tunnel interface, or organizing the outer IP header and send the packets out.
So basically these modules depend on each other in such a way:
recv messages transmit messages
+---------+ +---------------------+
|ip_tunnel| | dev_hard_start_xmit |
+---------+ +---------------------+
^ | |
| | |
+-------+ +---v----+ +----v----+
| ipip | | ipip | | gre |
+-------+ +------------------------+
^ |
| |
+-------+ +-----v-----+
|tunnel4| | ip_tunnel |
+---^---+ +-----------+
|
|
+-----------------+
|ip_local_deliver |
+-----------------+
workflow
In this section, we present how IPIP traffic receives by a node, or a machine, is handled during the different process and finally delivered to the recipient according to the internal destination IP address, and how to reply datagram is wrapped with IPIP tunnel overhead, and sent out of the node.
The following code is based on v4.14.96, referenced from https://elixir.bootlin.com
packets in
As the IPIP tunnel work on the IP layer, when the IPIP tunnel packets locate and reach the destination machine, the kernel will poll packets from the queue and deliver them according to the packet type according to their Ethernet protocol type, e.g. IP, ARP, and VLAN, we use IP protocol for IPIP, and the packet is delivered locally deliver the packet according to linux network kernel workflow.
ip_local_devlier pushes the packets to be handled through a series of rules defined in IP_LOCAL_IN
Netfilter chain
int ip_local_deliver(struct sk_buff *skb)
{
/*
* Reassemble IP fragments.
*/
struct net *net = dev_net(skb->dev);
if (ip_is_fragment(ip_hdr(skb))) {
if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
return 0;
}
return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN,
net, NULL, skb, skb->dev, NULL,
ip_local_deliver_finish);
}
After that, we come to ip_local_deliver_finish,ip_local_deliver_finish will handle over the packet according to the ip protocol type, like TCP(6), UDP(17), or IP-in-IP(4).
static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
__skb_pull(skb, skb_network_header_len(skb));
rcu_read_lock();
{
int protocol = ip_hdr(skb)->protocol;
const struct net_protocol *ipprot;
int raw;
resubmit:
raw = raw_local_deliver(skb, protocol);
ipprot = rcu_dereference(inet_protos[protocol]);
if (ipprot) {
int ret;
if (!ipprot->no_policy) {
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
kfree_skb(skb);
goto out;
}
nf_reset(skb);
}
ret = ipprot->handler(skb);
if (ret < 0) {
protocol = -ret;
goto resubmit;
}
__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
} else {
if (!raw) {
if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
icmp_send(skb, ICMP_DEST_UNREACH,
ICMP_PROT_UNREACH, 0);
}
kfree_skb(skb);
} else {
__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
consume_skb(skb);
}
}
}
out:
rcu_read_unlock();
return 0;
}
As said earlier, tunnel4 module is responsible for handling IPIP packets and deliver the packets a list of handlers registered.
In implementation, tunnel4 register the IPIP protocol handler by inet_add_protocol(&tunnel4_protocol, IPPROTO_IPIP)
,
static const struct net_protocol tunnelmpls4_protocol = {
.handler = tunnelmpls4_rcv,
.err_handler = tunnelmpls4_err,
.no_policy = 1,
.netns_ok = 1,
};
And deliver the IPIP packets to a handler list by
static int tunnelmpls4_rcv(struct sk_buff *skb)
{
struct xfrm_tunnel *handler;
if (!pskb_may_pull(skb, sizeof(struct mpls_label)))
goto drop;
for_each_tunnel_rcu(tunnelmpls4_handlers, handler)
if (!handler->handler(skb))
return 0;
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
drop:
kfree_skb(skb);
return 0;
}
#endif
Finanlly, we get to ipip module. IPIP module registers the IPIP packet handler during init xfrm4_tunnel_register(&ipip_handler, AF_INET)
static int ipip_rcv(struct sk_buff *skb)
{
return ipip_tunnel_rcv(skb, IPPROTO_IPIP);
}
static int ipip_tunnel_rcv(struct sk_buff *skb, u8 ipproto)
{
struct net *net = dev_net(skb->dev);
struct ip_tunnel_net *itn = net_generic(net, ipip_net_id);
struct metadata_dst *tun_dst = NULL;
struct ip_tunnel *tunnel;
const struct iphdr *iph;
iph = ip_hdr(skb);
tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
iph->saddr, iph->daddr, 0);
if (tunnel) {
......
return ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
}
return -1;
drop:
kfree_skb(skb);
return 0;
}
IPIP is a keyless protocol, unlike GRE, for IPIP, ip_tunnel_lookup
looks for the tunnel(device) by matching local IP with destination IP and remote IP with source IP, if it fails to find a tunnel, the fallback tunnel will be used. And the fallback tunnel is initialized by IPIP module, so it's always there.
ip_tunnel_rcv
will decapsulate the packets and send the packets to the tunnel device by netif_rx
. The following process will be the same as what we said earlier, except the protocol and the packet will be forward to the destination or received by the local process.
As you can see from the above process to handler an IPIP packet, we need to go through at least twice the recv path of Netfilter, PREROUTING, IP_LOCAL_IN/IP_FORWRAD
packets out
Every network device is setup with operations to handle transmit packets, change MTU of the device, etc, defined as net_device_ops
. IP IP tunnel device will add the overhead during transmitting the packets through the device.
static const struct net_device_ops ipip_netdev_ops = {
.ndo_init = ipip_tunnel_init,
.ndo_uninit = ip_tunnel_uninit,
.ndo_start_xmit = ipip_tunnel_xmit,
.ndo_do_ioctl = ipip_tunnel_ioctl,
.ndo_change_mtu = ip_tunnel_change_mtu,
.ndo_get_stats64 = ip_tunnel_get_stats64,
.ndo_get_iflink = ip_tunnel_get_iflink,
};
To encapuse a packet as a IPIP packet, we need first forward our packets to IPIP tunnel device. IPIP tunnel device determines the outer header IP address by the tunnel remote and local paramter, otherwise by look up the route, and set the next hop of the internal dst IP address as the outer dst IP address, the outer src IP address is determined according to the outer dst IP address.
static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct ip_tunnel *tunnel = netdev_priv(dev);
const struct iphdr *tiph = &tunnel->parms.iph;
u8 ipproto;
switch (skb->protocol) {
case htons(ETH_P_IP):
ipproto = IPPROTO_IPIP;
break;
#if IS_ENABLED(CONFIG_MPLS)
case htons(ETH_P_MPLS_UC):
ipproto = IPPROTO_MPLS;
break;
#endif
default:
goto tx_error;
}
if (tiph->protocol != ipproto && tiph->protocol != 0)
goto tx_error;
if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP4))
goto tx_error;
skb_set_inner_ipproto(skb, ipproto);
if (tunnel->collect_md)
ip_md_tunnel_xmit(skb, dev, ipproto);
else
ip_tunnel_xmit(skb, dev, tiph, ipproto);
return NETDEV_TX_OK;
tx_error:
kfree_skb(skb);
dev->stats.tx_errors++;
return NETDEV_TX_OK;
}
after encapsulating the IPIP packets, they are then sent to ip_local_out
, which will first go through the rules deinfed in IP_LOCAL_OUT
Netfilter chains, and output through the device determined by route according to linux network kernel workflow.
void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
__be32 src, __be32 dst, __u8 proto,
__u8 tos, __u8 ttl, __be16 df, bool xnet)
{
int pkt_len = skb->len - skb_inner_network_offset(skb);
struct net *net = dev_net(rt->dst.dev);
struct net_device *dev = skb->dev;
struct iphdr *iph;
int err;
skb_scrub_packet(skb, xnet);
skb_clear_hash_if_not_l4(skb);
skb_dst_set(skb, &rt->dst);
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
/* Push down and install the IP header. */
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
iph->version = 4;
iph->ihl = sizeof(struct iphdr) >> 2;
iph->frag_off = ip_mtu_locked(&rt->dst) ? 0 : df;
iph->protocol = proto;
iph->tos = tos;
iph->daddr = dst;
iph->saddr = src;
iph->ttl = ttl;
__ip_select_ident(net, iph, skb_shinfo(skb)->gso_segs ?: 1);
err = ip_local_out(net, sk, skb);
if (unlikely(net_xmit_eval(err)))
pkt_len = 0;
iptunnel_xmit_stats(dev, pkt_len);
}