Lecture Notes by Anthony Zhang.

CS456

Computer Networks.

Sergey Gorbunov
Section 002
Email: sgorbunov@uwaterloo.ca
Office Hours: Mondays 11am-12pm
Mondays/Wednesdays 2:30pm-3:50pm

11/9/17

Course content on LEARN and Piazza. Three assignments, worth 40% overall. 20% midterm and 40% final.

Textbook: computer networking, a top-down approach, 7th edition.

The internet can be thought of as a network of networks, such as mobile networks, regional ISPs, and home networks. The internet consists of connected computers known as hosts, running network apps that communicate over communication links like fiber/copper/satellite. To implement these communications, we use packet switches like routers and switches to forward packets to the right places.

A protocol is an agreed-upon standard for communication, defining the format, order, and actions to take upon communications. An access network connected end users to the rest of the internet, such as DSL (internet over phone lines) and cable (many channels of data encoded in one wire on different frequencies - frequency division multiplexing). Devices in a network have to share access to the cable/DSL modem, using a network switch, or more commonly, a router.

Hosts send packets of data, by breaking up the data into units of length L. The packets are transmitted at the transmission rate R, determined by the link capacity (bandwidth). At the core of a network is a mesh of interconnected routers, making sure every packet gets to the right place. When we visit a website, our local router checks whether the website is on the local network, and if not, forwards it to the router's default gateway, which is the next network along the way to the destination. Every router has a default gateway, which is where the packet should get forwarded to if the router doesn't know what to do with it explicitly.

Routers receive entire packets and store them before processing them and transmitting them to the next link - store and forward. This ensures that we can do error correction properly on the packet, and so that we have all of the packet metadata so we can implement traffic shaping. Since it takes L/R time to transmit/receive a packet, it takes 2L/R time to receive and then retransmit a packet, assuming no propagation delay. To implement store-and-forward, it's generally necessary to queue up incoming packets, and drop packets if the queue fills up.

The key functionality of the network core is to route (determine the path from source to destination using routing algorithms) and forwarding (move packets from the input of the router to the correct output). Other networks use circuit switching, where there are reserved routes between each node that are not shared. This is less efficient but has much better guarantees for things like bandwidth and latency - we don't have issues with congestion, packet loss, or delays like with packet switching, but we're wasting resources when the network isn't at capacity. Circuit switching can use either frequency division multiplexing (channels send data on different frequencies) or time division multiplexing (channels take turns sending data on all frequencies, sort of like packet switching).

Circuit switching vs. packet switching is more of a reserved resources vs. on-demand resources distinction. For applications like audio/video, we'd like to get circuit-like properties over a packet network - this is still an unsolved problem.

13/9/17

Assignment 1 posted, due October 2, writing a simple socket program.

End users usually connect to the internet via access networks provided by access ISPs, and then the millions of access ISPs are connected to each other. ISPs can be connected to each other in a ring, but tihs has reliability issues, makes routes between hosts very long, and doesn't scale up, since the amount of traffic each ISP would have to handle increases linearly with respect to the number of ISPs. ISPs could all be fully connected with each other, but this is impractical, since this would require millions of connections per ISP.

What we end up doing is to use higher-tier ISPs that each connect many access ISPs. These higher-tier ISPs can then connect to each other via peering links, and they can agree to forward each other's packets. Internet exchange points (IXPs) control peering links to make sure their resources are being used efficiently. Due to the tiering, the internet can be considered a network of networks. There are also many tiers in between global and access ISPs, like regional networks and content provider networks (like Google CDN, Akamai). So at the center, we have a small number of well-connected Tier 1 ISPs (there's about 7 of them now, like Level3, Sprint, AT&T), then content providers like Google and Akamai connect to those.

Packet delay and loss usually occur at routers. Routers have to queue up packets when their incoming packet rate exceeds its available packet output capacity, and if the queue becomes full, the router has to start dropping packets from the queue, causing packet loss. Packet loss and delay also commonly happens due to bad connections, when they simply get lost in transmission or are delayed because we have to retransmit them. Specifically, routers induce queueing delay (time spent waiting in the incoming packet queue), transmission delay (time spent transmitting packet on output channel, a function of the packet size and link bandwidth), processing delay (time spent for router to determine where to send a packet), and propagation delay (time spent for packet to travel along the outgoing connection, a function of the link's physical length and the propagation speed, usually the speed of light).

The queueing delay depends on the average incoming packet rate a, the link bandwidth R, and the average packet length L. If La/R \le 1, the queueing delay is proportional to that value. Otherwise, queueing delay will tend to infinity, as the router can no longer keep up.

Real delays on the internet can be inspected using tools like traceroute. Traceroute uses ICMP to find individual delays from the source to each router along the way to a destination. Essentially, for each router i along the path, we send three ICMP packets that will be detected specifically by router i along the path, which then responds to the ping. Traceroute then times how long it took for the three replies to come back, then shows the delays and the hosts at which our packets were dropped. The packet won't always take the same path, but it can give us a general idea of what routing delays are like.

Packets what's called a TTL (time to live), which determine the number of times the packet can pass through a router. This ensures that if we end up with a cyclic route, packets don't go back and forth forever - instead, packets get their TTL decremented each time they're retransmitted, and are dropped if their TTL reaches 0. For ICMP, we can also request that the router that dropped the packet notify us via ICMP that they dropped our packet. The TTL is a hop count rather than a real-world time because real-world time doesn't really map to how much congestion a packet might cause, only how much it clogs up routers does.

To make sure the traceroute packets go specifically to router i and not any other routers, we set the ICMP packet's TTL to i and request that we get notified if the packet is dropped. After i hops, the packet gets to router i and we get an ICMP packet back telling us that router i dropped it.

Throughput is the maximum transmitting/receiving rate. Usually this is a bottleneck at access ISP and server-to-ISP connections, rather than in things like peering links - the lowest throughput link in a route determines the overall throughput of the entire route.

The internet is organized into many layers, to allow different components to operate at simple levels of abstraction. The 7-layer model is the main way to look at these:

Generally, lower layers will add headers/footers to the higher layers' units during transportation, and then strip them out when they arrive.

The internet wasn't originally designed with much security, assuming mutually trusting users on a network - a lot of work has taken place since to make sure the internet still works today.

18/9/17

Network applications are those that run on end systems (not network core devices), and communicate over networks as part of their core functionality. For example, web browsers, email clients, and text messaging apps.

The client-server architeture is probably the most common one in use today. Servers are always-on hosts that generally live in datacenters and have static IP addresses. Clients might be intermittently connected, use dynamic IPs, and initiate connections to servers. Another common architecture is P2P, where there's no centralized servers and arbitrary end systems can directly communicate, like with BitTorrent.

Processes are programs running on hosts. Processes on the same host communicate using IPC, and processes on different hosts exchange messages instead, usually using sockets. Clients are the initiators of communication, while servers wait to be contacted. A host is identified by its IP address and port.

Most commonly, network apps need data integrity, timing, throughput, and security.

TCP provides reliable transport (same order for sending and receiving), flow control (sender won't overwhelm receiver), congestion control (serve won't overwhelm network). It's connection oriented and doesn't provide timing/throughput/security guarantees. In contrast, UDP doesn't provide any of the above, but is much faster and lower computational/network overhead. It's also packet-oriented. Application layer protocols like SMTP, Telnet, HTTP, or FTP are based on TCP, while others like HTTP Streaming, RTP, SIP might use UDP.

TCP can be secured by running SSL/TLS at the application layer on top of it. It also provides data integrity and endpoint authentication.

Web

Web pages are made of objects, such as HTML files, PNGs, MP3s, which are addressible by their URL and might reference each other by URL. Objects are requested by a web browser (client), from a web server, using HTTP. The client initiates a TCP connection on port 80 (443 for SSL/TLS), and the server accepts the TCP connection and exchanges HTTP messages before closing the connection. HTTP is stateless, and by default fulfilling any request doesn't require us to know about previous requests.

For non-persistent HTTP, we need two round trips per request - one to request, one to respond. In persistent HTTP, the server leaves the connection open afterward, saving round trips in future requests.

Telnet opens a raw TCP connection to a host. For example, you can do telnet some_host 80 and then type in an HTTP GET request, and you'll get an HTTP response back.

User state in HTTP is usually done via cookies - special Cookie headers in HTTP responses associate values with that host in the user's browser, and those values are sent back via the Cookie header in future HTTP requests.

Demo with wireshark capturing HTTP messages over TCP, showing headers, application/transport/network layers.

Caching proxies are often used to shorten RTTs by being closer/on faster networks, and to reduce load on application servers. They access the page on your behalf, pass the responses back, and save them to return them directly in future requests, bypassing the application server.

Email

Email works using user agents (e.g., Outlook, Thunderbird), which connect to mail servers. User agents communicate with mail servers using POP (authentication and download), IMAP (antuentication/download/on-server message manipulation), and HTTP (for webmail services like GMail), and mail servers communicate with each other using SMTP (simple mail transfer protocol), on top of TCP on port 25. SMTP is a plain text protocol much like HTTP, with 7-bit ASCII messages, CRLF EOL characters, and status codes/phrases in responses.

Just like with the web, you can do telnet some_host 25 and then type in SMTP requests, and you'll get an SMTP response back.

Creative Commons License This work by Anthony Zhang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Copyright 2013-2017 Anthony Zhang.