1. Network-layer services and protocols
1.1. Host-to-host communication Services
* sender: encapsulates segments into datagrams, passes to link layer
* Receiver: delivers segments to transport layer protocol
1.2. Network layer Protocols in every Internet device: host, routers
1.3. Routers
* examines header fields in all IP datagrams passing through it
* moves datagrams from input ports to output ports to transfer datagrams along end-end path
2. IPv4 Datagram format
3. IP fragmentation/ reassembly
3.1. Net work links have MTU
* MTU: the maximum amount of data that a link-layer frame can carry
* different link types, different MTUs
3.2. Large IP datagram divided(fragmented) with net
* one datagram becomes several datagrams
* “reassembly” only at destination
* IP header bits used to identify, order related fragments
4. IP addressing (IP地址)
4.1. Interface
Connection between host/router and physical link
* routers typically have multiple interfaces
* host typically has one or two interfaces
4.2. IP address
32-bit identifier associated with each interface
4.3. Dotted-decimal IP address notation
223.1.1.1 = 11011111 00000001 00000001 00000001, each number is one byte
4.4. Structure
IP addresses have structure:
* subnet part
* host part
4.5. Subnets
(1) Definition
Device interfaces that can physically reach each other without passing through an intervening 中间的router(就是中间没有其他路由器的网络)
(2) IP address have structure:
* subnet part: devices in same subnet have common high order bits
* host part: remaining low order bits
(3) Picture of subnets.
* IP addressing assigns an address to the subnet, e.g., 223.1.1.0/24
The /24 notation is subnet mask, indicating that the leftmost 24 bits of the 32-bit quantity define the subnet address
* The 223.1.1.0/24 subnet consists of three host interfaces and one router interface
* Any additional hosts attached to this subnet will be required to have an address of the form 223.1.1.***
* /24表示high-order 24 bits为subnet part,剩下的32-24 = 8bits 为host part
* subnet mask: /24
4.6. Classful addressing分类编址
IP地址的网络部分被限制为长度为8、16、24bit,分别称为A、B、C类网络
分类编址对于资源是一种浪费,如为A类网引入2000台主机,则有16775000左右地址不能被其他组织使用(要满足该类子网掩码的地址才行),因此,引入CIDR
4.7. CIDR
Classless InterDomain Routing无类别间路由选择
* variable-length subnet masking allowing arbitrary-length prefixes
* address format: a.b.c.d/x, where the x most significant bits are the network prefix, and the remaining 32-x bits form the host identifier
* subnet mask 255.254.0.0 = /15
notations: IP address/mask length is equivalent to IP address/subnet mask: e.g., 12.4.0.0/15 or 12.4/15 is equivalent to 12.4.0.0/255.254.0.0
5. Questions
5.1. how does an ISP get block of addresses?
A: ICANN: Internet Corporation for Assigned Names and Numbers http://www.icann.org/
* allocates IP addresses, through 5 regional registries (RRs) (who may then allocate to local registries)
* manages DNS root zone, including delegation of individual TLD (.com, .edu , …) management
5.2. are there enough 32-bit IP addresses?
* ICANN allocated last chunk of IPv4 addresses to RRs in 2011
* NAT (next) helps IPv4 address space exhaustion
* IPv6 has 128-bit address space
DHCP部分有更详细的说明
6. IPv6
6.1. Motivation
(1) initial motivation: 32-bit IPv4 address space would be completely allocated
(2) additional motivation
* speed processing/forwarding: 40-byte fixed length header
* enable different network-layer treatment of “flows”
6.2. Address format
* 128-bit addresses
* eight groups of four hexadecimal digits, separated by colons ‘:’
* Leading zeros in each group may be omitted and groups of zeros can be omitted using ‘::’
6.3. Datagram format
(1) What’s missing (compared with IPv4):
* no checksum (to speed processing at routers)
* no fragmentation/reassembly
* no options (the removal of options filed results in a fixed-length, 40-byte IP header)
6.4. Transition from IPv4 to IPv6
tunneling: IPv6 datagram carried as payload in IPv4 datagram among IPv4 routers (“packet within a packet”)
7. Get IP address
7.1. Q: How does a network get IP address for itself (network part of address)
A: gets allocated portion of its provider ISP’s address space
7.2. Hierarchical addressing: route aggregation
hierarchical addressing allows efficient advertisement of routing information:
7.3. Q: How does a host get IP address within its network (host part of address)?
(1) Manually assignment
* To many options!
* How do you know if an IP address isn’t taken?
* Repeat every time you connect?
(2) Dynamically assignment
DHCP: Dynamic Host Configuration Protocol: dynamically get address from a server
l DHCP Background
Allows a host to join an IP network without having a pre-configured IP address
* Runs over UDP/ IP
* Temporarily binds IP address and other parameters to DHCP client
* Provides framework for passing configuration information to hosts
l DHCP assigns a unique IP address
* Simplifies installation and configuration of end systems
* Allows for manual and automatic IP address assignment
* DHCP can return more than just allocated IP address on subnet
* address of first-hop router(default router) for client
* name and IP address of local DNS sever
* network mask (indicating network versus host portion of address)
l Used by
* ISPs to minimise set-up costs
* LANs and organisational networks
l goal: allow host to dynamically obtain its IP address from network server when it joins network
* can renew its lease on address in use
* allows reuse of addresses (only hold address while connected/on)
* support for mobile users who join/leave network
* Plug-and-play
l DHCP Overview
* host broadcasts DHCP discover msg [optional]
* DHCP server responds with DHCP offer msg [optional]
* host requests IP address: DHCP request msg
* DHCP server sends address: DHCP ack msg
Typically, on home networks, DHCP server is co-located in the router
l DHCP Leases
Address Usage
* Client begins to attempt to renew the lease once half the lease time has expired and this is done by sending a unicast DHCPREQUEST message to the DHCP server that granted the original lease
* After address has expired client must stop using address and acquire a new address by broadcasting a DHCPDISCOVER message
* If there are more than one DHCP server client can select the best “offer”
Address Leases
* Manual Lease: Network manager explicitly assigns all IP addresses
* Automatic Lease: DHCP server permanently assigns some addresses and dynamically others
* Dynamic Lease: DHCP server dynamically assigns IP addresses for a specific period of time when permanent address is not required
l Example
(1) Send request (host)
* Connecting laptop will use DHCP to get IP address, address of first-hop router, address of DNS server.
* DHCP REQUEST message encapsulated in UDP, encapsulated in IP, encapsulated in Ethernet
* Ethernet frame broadcast (dest: FFFFFFFFFFFF) on LAN, received at router running DHCP server
* IP datagram extracted from Ethernet , UDP extracted from IP datagram, DHCP REQUEST message extracted from UDP
(2) Return msg (router with DHCP)
* DHCP server formulates DHCP ACK containing client’s IP address, IP address of first-hop router for client, name & IP address of DNS server
* encapsulated DHCP server reply forwarded to client, demuxing up to DHCP at client
* client now knows its IP address, name and IP address of DNS server, IP address of its first-hop router
l DHCP msg format
【随便看看】
DHCPDISCOVER: Broadcast by a client to find available DHCP servers
DHCPOFFER: Response from a server to a DHCPDISCOVER and offering IP address and other parameters
DHCPREQUEST: Message from a client to servers that does one of the following:
Requests the parameters offered by one of the servers and declines all other offers
Verifies a previously allocated address after a system or network change (a reboot for example)
Requests the extension of a lease on a particular address
DHCPACK: Acknowledgement from server to client with parameters, including IP address
DHCPNACK: Negative acknowledgement from server to client, indicating that the client's lease has expired or that a requested IP address is incorrect
DHCPDECLINE: Message from client to server indicating that the offered address is already in use
DHCPRELEASE: Message from client to server canceling remainder of a lease and relinquishing network address
DHCPINFORM: Message from a client that already has an IP address (manually configured for example), requesting further configuration parameters from the DHCP server
l DHCP pros and cons
优点
* Relieves the network administrator of manual configuration
* Devices can be moved from network to network and automatically obtain valid configuration parameters for the current network
* IP addresses are only allocated when needed
It is possible to re-use IP addresses after lease
Especially considering mobile clients
Reduction in the total number of addresses in use
缺点
Server Issues
* A machine to run the DHCP server continually is required
When DHCP server is unavailable, clients may be unable to access network
Security Problems
* Uses UDP, an unreliable and insecure protocol
* DHCP is an unauthenticated protocol
When connecting to a network, the user is not required to provide credentials in order to obtain a lease
Malicious users with physical access to the DHCP-enabled network can instigate a denial-of-service attack on DHCP servers by requesting many leases from the server, thereby depleting the number of leases that are available to other DHCP clients
l 32-bit IP addresses is not enough
* ICANN allocated last chunk of IPv4 addresses to RRs(regional routers不太确定,看到后再确认吧) in 2011
* Not that many unique addresses
232 = 4,294,967,296 (just over four billion)
Plus, some are reserved for special purposes
And, addresses are allocated in large blocks
* And, many devices need IP addresses
Computers, tablets, routers, watches, fridges, …
* Solutions include
Short-term solution: NAT & DHCP helps IPv4 address exhaustion
Long-term solution: IPv6 has 128-bit address space
8. NAT Network Address Translation网络地址转换
8.1. Overview
Local network uses just one IP address as far as outside world is concerned
advantages:
* just one IP address needed from provider ISP for all devices
* can change addresses of devices in local network without notifying outside world
* can change ISP without changing addresses of devices in local network
* security: devices inside local net not directly addressable, visible by outside world
8.2. Implementation
NAT router must (transparently):
* outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)
remote clients/servers will respond using (NAT IP address, new port #) as destination address
* remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair
* incoming datagrams: replace (NAT IP address, new port #) in destination fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table
8.3. Example
下图描述了一次消息 发送+处理后返回 的过程
8.4. Overview again
* 16-bit port number field:
60,000 simultaneous connections with a single WAN-side address!
* NAT has been controversial:
routers “should” only process up to layer 3
address “shortage” should be solved by IPv6
violates end-to-end argument
NAT possibility must be taken into account by app designers, e.g., P2P apps
* NAT traversal: what if client wants to connect to server behind NAT?
9. what’s inside a router
* input ports, switching, output ports
* buffer management, scheduling
9.1. Two key network-layer functions
forwarding: move packets from a router’s input link to appropriate router output link
routing: determine route taken by packets from source to destination
Routing algorithms
9.2. Interplay 相互影响 between routing and forwarding
9.3. Router architecture overview
high-level view of generic router architecture
通用路由器的高级视图
(1) Input port functions
* physical layer: bit-level reception
* link layer: e.g., Ethernet
* decentralized switching:
using header field values, lookup output port using forwarding table in input port memory (“match plus action”)
destination-based forwarding: forward based only on destination IP address (traditional)
input port queuing: if datagrams arrive faster than forwarding rate into switch fabric
(2) Destination-based forwarding
不同的IP会对应不同的接口,但是总有IP部分重复的情况,此时用到最长匹配
Longest prefix matching
when looking for forwarding table entry for given destination address, use longest address prefix that matches destination address. (most specific match)
(3) Switching fabrics 交换网络/结构
transfer packets from input ports to appropriate output ports
switching rate: rate at which packets can be transfer from inputs to outputs
three major types of switching fabrics:
(4) Input port queuing
If switch fabric slower than input ports combined -> queueing may occur at input queues
* queueing delay and loss due to input buffer overflow!
Head-of-the-Line (HOL) blocking线路前部阻塞: queued datagram at front of queue prevents others in queue from moving forward
(5) Output port queuing
Buffering required when datagrams arrive from fabric faster than link transmission rate.
* Datagrams can be lost due to congestion, lack of buffers
Scheduling discipline调度规则chooses among queued datagrams for transmission
* Priority scheduling – who gets best performance, network neutrality
书本P210
当没有足够的内存来缓存一个入分组时,要么丢弃到达的分组(称弃尾drop-tail策略),要么删除一个或多个已经排队的分组,在某些情况下,在缓存之前便丢弃一个分组的做法是有利的,这可以向发送方发送一个拥塞信号。
(6) Packet scheduling
deciding which packet to send next on link
有三种方式
l FCFS (first come, first served)
队列(先进先出/先进先处理)
* Packets transmitted in order of arrival
* also known as: First-in-first-out (FIFO)
* Simple implementation
* No flow prioritization
* No fairness, no isolation
* Work-conserving scheduler
l Priority scheduling
优先级调度
首先,arriving traffic classified into priority classes
* any header fields can be used for classification
然后,send packet from highest priority queue that has buffered packets
* FCFS within priority class
到达顺序为:12345,分成两级(class,红>绿)处理顺序为:13245(已经在处理的不能停下,因此,2在4前面处理完成)
l Round Robin (RR) scheduling
循环排队调度
arriving traffic classified, queued by class
* any header fields can be used for classification
server cyclically, repeatedly scans class queues, sending one complete packet from each class (if available) in turn优先级轮流着来(比如先是红色最优,然后绿色最优,然后蓝色最优)
按照上面的例子,处理顺序就该是:12345(假设先是红色1,然后绿色2,然后红色3,然后4(因为4在五之前已经在排队了,这时没有绿色-5),最后是5)
l 还有一种加权排队,书本P213
10. 转发表和流表的工作实现
转发表(在基于目的转发的场景中)和流表(在通用转发的场景中)是链接网络层的数据平面和控制平面的首要元素。这些表定义了一台路由器的本地数据转发行为。这些表的计算、维护的安装的实现有两种可能的方法:
10.1. Per-router control (plane)(每路由器控制)
Individual routing algorithm components in each and every router interact in the control plane
每台路由器有一个路由选择组件,用于其与其他路由器选择组件通信,以计算转发表的值
10.2. Software-Defined Networking(SDN) control plane(逻辑集中式控制器)
Remote controller computes, installs forwarding tables in routers
逻辑集中式控制器计算并分发转发表以供每台路由器使用。该控制器经一种定义良好的协议与每台路由器中的一个控制代理(CA)进行交互,以配置和管理该路由器的转发表。CA任务在于与控制器通信并按控制器命令执行。CA不能主动参与计算转发表(remote controller computes)【“每路由~”与“逻辑集中~”关键差异】。
11. Routing protocols
Routing protocol goal: determine “good” paths (equivalently, routes), from sending hosts to receiving host, through network of routers
目的:从发送方到接收方的过程中确定一条通过路由器网络的好的路径(等价于路由),其中“好”意味着最低开销
path: sequence of routers packets traverse from given initial source host to final destination host
good: least “cost”, “fastest”, “least congested”
(routing: a “top-10” networking challenge!)
11.1. Graph abstraction: link costs
路径选择问题可以抽象成图论中的最短路径问题
其中,G = (N, E)表示这个图有N个节点,E条边;如果存在一对连接起来的节点,如上图中的(x, y),我们成他们互为neighbor;最低开销路径: least-cost path;最短路径: shortest path
11.2. Routing algorithm classification(路由选择算法的分类)
有三种分类:centralized/ decentralized routing algorithm; static/ dynamic routing algorithm; load-sensitive/ load-insensitive algorithm(这一种课件没有)
l 集中式路由选择算法(centralized routing algorithm)要求在计算之前知道所有节点的连通性以及开销,以这些信息作为输入,计算本身可以在某个场点(如之前所讲的Software-Defined Networking(SDN) control逻辑集中式控制器)。具有全局(global)状态信息的算法被称为链路状态算法(Link State, LS algorithm),这个算法将在11.3中讲解。
global: all routers have complete topology, link cost info
l 分散式路由选择算法(decentralized routing algorithm),路由器以迭代、分布式的方式计算出最低开销路径。通过迭代计算过程以及于相邻节点的信息交换,一个节点逐渐计算出到达某目的节点或一组目的节点的最低开销路径。其中一种算法称为距离向量(Distance-Vector)算法的分散式路由选择算法(更适合之前所讲的Per-router control每路由器控制)。距离向量算法将在11.4中讲解。
decentralized: iterative process of computation, exchange of info with neighbors
routers initially only know link costs to attached neighbors
l 静态路由选择算法(static routing algorithm),在这里,路由随时间的变化非常缓慢,通常是人工进行调整(如人为手工编辑一条链路开销)
static: routes change slowly over time
l 动态路由选择算法(dynamic routing algorithm),随着网络流量负载等发生变化而改变路由选择路径。一个动态算法可周期性地运行或响应开销的变化,缺点也是更容易受(异常情况,如震荡-oscillations开销不停改变)影响。
dynamic: routes change more quickly
periodic updates or in response to link cost changes
11.3. link state链路状态选择算法
(1) centralized: network topology, link costs known to all nodes
* accomplished via “link state broadcast”
* all nodes have same info
(2) computes least cost paths from one node (“source”) to all other nodes
* gives forwarding table for that node
(3) iterative迭代: after k iterations, know least cost path to k destinations
(k次迭代后能够知道k个目标节点的最短路径)
(4) Dijkstra’s algorithm
l Example
Result, 配置转发表
l Overview about Dijkstra’s algorithm
l oscillations possible
11.4. Distance-Vector 距离向量算法
距离向量算法是一种分布式、迭代的、异步的算法。
分布式的:每个阶段都要从一个或多个直接相连neighbor接受某些信息,执行计算,然后将其计算结果分发给邻居;
迭代的:此过程一直要持续到neighbor之间没有更多的信息要交换为止(此算法是自我终止的);
异步的:它不要求所有节点之间步伐一致地操作。
(1) Bellman-Ford equation overview
l Example
(2) Distance vector algorithm
Key idea:
l from time-to-time, each node sends its own distance vector estimate to neighbors
l when x receives new DV estimate from any neighbor, it updates its own DV using B-F equation:
Dx(y) ← minv{cx,v + Dv(y)} for each node y ∊ N
l under minor, natural conditions, the estimate Dx(y) converge to the actual least cost dx(y)
l Example
Iteration
Computation
11.5. Comparison of LS and DV algorithms
12. intra-ISP routing ISP间内部路由选择
12.1. 所有路由器执行相同的路由器选择算法面临的问题
(1) scale: hundreds of millions of routers:
* can’t store all destinations in routing tables!
* routing table exchange would swamp links!
(2) administrative autonomy:
* Internet: a network of networks
* each network admin may want to control routing in its own network
12.2. 解决
(1) 书上解释
上述问题都可以通过将路由器组织进自治系统(Autonomous System, AS)来解决,其中每个AS由一组通常处在相同管理控制下的路由器组成(通常在一个ISP中的路由器以及互联它们的链路组成一个AS,也有ISP将它们的网络分为数个AS)。在一个自治系统内运行的路由选择算法叫做自治系统内部路由选择协议(intra-autonomous system routing protocol)。
(2) Internet approach to scalable routing
aggregate routers into regions known as “autonomous systems”(AS, a.k.a. “domains”, identified by an ASN number)
intra-AS (aka “intra-domain”):
l routing within same AS (“network”)
* all routers in AS must run same intra-domain protocol
* routers in different AS can run different intra-domain routing protocols
* gateway router: at “edge” of its own AS, has link(s) to router(s) in other ASes
l routing among ASes
* gateways perform inter-domain routing (as well as intra-domain routing)
12.3. Interconnected ASes
(1) Inter-AS routing: a role in interdomain forwarding
suppose router in AS1 receives datagram destined outside of AS1:
Question: router should forward packet to gateway router in AS1, but which one?
Answer
AS1 inter-domain routing must:
* learn which destinations reachable through AS2, which through AS3
* propagate this reachability info to all routers in AS1
(2) Intra-AS routing: routing within an AS
most common intra-AS routing protocols:
l RIP: Routing Information Protocol [RFC 1723]
classic DV: DVs exchanged every 30 secs
l EIGRP: Enhanced Interior Gateway Routing Protocol
DV based
formerly Cisco-proprietary for decades (became open in 2013 [RFC 7868])
l OSPF: Open Shortest Path First [RFC 2328]
link-state routing
IS-IS protocol (ISO standard, not RFC standard) essentially same as OSPF
(3) OSPF (Open Shortest Path First) routing
l “open”: publicly available
l classic link-state
each router floods OSPF link-state advertisements (directly over IP rather than using TCP/UDP) to all other routers in entire AS
multiple link cost metrics possible: bandwidth, delay
each router has full topology, uses Dijkstra’s algorithm to compute forwarding table
l security: all OSPF messages authenticated (to prevent malicious intrusion)
l 优点(包括上述security):
安全,拥有鉴别功能,仅有受信任的路由器能够参与一个AS内的ODPF协议。
多条相同开销的路径,当达到某目的地有多条选择路径(相同开销)时支持多路径传输
对单播与多播路由器选择的综合支持
支持在单个AS中的层次结构,一个OSPF自治系统能够层次化地配置多个区域,每个区域都运行自己的ODPF链路状态路由选择算法
l Hierarchical OSPF
13. BGP
ISPF为AS内部的协议,而AS之间的交流需通过BGP协议
13.1. Internet inter-AS routing: BGP
* 从邻居AS获得前缀的可达性信息(13.2)
* 确定到该前缀的“最好的”路由(13.2)
13.2. Example
(1) BGP path advertisement
BGP路径播发
(2) BGP path advertisement(more)
当有多条path满足需求时
l 这里选择AS1 router 1c choose path AS3,X是BGP“确定最好的路由”的体现,其中一个算法(也是最简单的算法)热土豆路由选择(hot potato routing)可实现。
热土豆是一个自私的算法,它只关心并试图减小在它自己AS中的开销,而忽略其AS 之外的端到端开销的其他部分。
13.3. BGP messages
13.4. Example: how to update the forwarding table once BGP path is advertised
Steps.
* recall: 1a, 1b, 1d learn via iBGP from 1c: “path to X goes through 1c”
* at 1d: OSPF intra-domain routing: to get to 1c, use interface 1
* at 1d: to get to X, use interface 1
* at 1a: OSPF intra-domain routing: to get to 1c, use interface 2
* at 1a: to get to X, use interface 2
13.5. Difference between Intra-AS routing and Inter-AS routing
13.6. BGP: achieving policy via advertisements
13.7. BGP route selection
选择“最好的路由”,在此之前,先了解一些基本的BGP术语。路由器通过BGP连接通告前缀时,它在前缀中包括一些BGP属性(BGP attribute)。用BGP来说,前缀及其属性称为路由(route),两个较为重要的属性是:AS-PATH, NEXT-HOP。
AS-PATH属性包含了通告已经通过的AS的列表,比如说从ASI到子网x有两条路,分别为AS-PATH ”AS2 AS3”和AS-PATH ”AS3”。BGP还通过AS-PATH来避免环路,特别是,当一台路由器在路径列表中看到了包含它自己的AS,它将拒绝该通告。
NEXT-HOP, 在AS间和AS内部路由选择协议之间提供关键链路方面,NEXT-HOP属性极其重要。NEXT-HOP是AS-PATH起始的路由器接口的IP地址:
如上图,AS1中的每台路由器都知道了前缀x的两台BGP路由(NEXT-HOP: AS-PATH; x):
路由器2a的最左侧接口的IP地址: AS2 AS3; x
路由器3a的最左侧接口的IP地址: AS3; x