You will analyze the TCP packets exchanged between Gustavus's web server and my laptop as I was downloading a file. Because I used dummynet to randomly discard some fraction of the packets arriving at my laptop, you will be able in particular to observe TCP's response to lost segments. Because we have ten packet traces collected using one loss rate and ten others collected using a different loss rate, you will be able to observe the impact of loss rate on throughput (or, more precisely, on "goodput": the throughput of useful data).
I recently captured the packets while downloading the same file (http://gustavus.edu/+max/max.jpg) 20 times: 10 times with loss rate A and 10 times with loss rate B. Because I was using Gustavus's actual network as well as my dummynet, there may be some other variations between the traces beyond the two different loss rates. If I had collected all the traces with one loss rate first, and then all of the other, I might have introduced some systematic confounding effect. (For example, consider what would happen if some other user had started heavily loading the network half way through my data collection.) Therefore, I used a pseudo-random number generator to calculate a randomized ordering of the 20 trials. To keep your job simple, I turned off SACK.
The packet traces were captured upstream from the artificially introduced packet loss, and so should include all packets that either host transmitted, unless there happens to have been some unintentional packet loss within Gustavus's network, which seems unlikely. For the packets generated by my laptop (mostly pure ACKs), I expect that all of them reached Gustavus's server, because our network doesn't drop many packets. For the packets captured as they arrived at my laptop, on the other hand, some were discarded prior to TCP processing, with effects you will see.
The packet traces are available in a single file, traces.tar.gz, which can be unpacked to produce a directory called traces containing the 20 individual trace files: 10 with loss rate A (a0, a1, a2, a3, a4, a5, a6, a7, a8, and a9) and 10 with loss rate B (b0, b1, b2, b3, b4, b5, b6, b7, b8, and b9).
As described in the next section, you will do some broad analysis of all 20 traces, and then more in-depth analysis on one particular pair of traces. I will assign each of you a number from 0 to 9. If you are student number 5, for example, you will do your in-depth analysis on traces a5 and b5. We can later pool together the whole class's results from the in-depth analyses.
You will analyze the previously captured packet traces using Wireshark. When you start up Wireshark on our Linux machines, it may ask you for the root password, which is needed for packet capture. Since we will be using pre-captured files, you can click on the button that says to run unprivileged.
When you open up one of the trace files in Wireshark, it displays a scrollable list of all the packets, including initial connection setup, the transmission (and sometimes retransmission) of data and acknowledgments, and the connection tear-down. The first thing you should find out is the total elapsed time.
Next, find out how many packets from the server are included in the trace. To focus on just the packets from the server, you can apply a
tcp.srcport == 80.
After doing this, you
can locate the number of displayed packets at the
bottom of the Wireshark window. (The total number of packets is also shown.)
Next, in the Statistics menu, select TCP Stream Graph and from that submenu select Time-Sequence Graph (Stevens). By examining the resulting graph for significant horizontal gaps (a couple tenths of a second or more), identify the timeouts that occurred. (Alternatively, you could look for the timeouts while scrolling through the list of packets. However, I think it is valuable to see the visual representation of the data, to get some qualitative feel for how the timeouts are sized and located.)
The final piece of information you should collect from all twenty trace files is the number of packets that are retransmissions. Some of these are triggered by the timeouts you just identified, but others are triggered by repeated duplicate acknowledgments, i.e., the fast retransmit feature of TCP. Wireshark contains a TCP analysis module which, if it were working correctly, would identify packets as being "retransmissions" or "fast retransmissions." However, this feature is currently buggy; some retransmissions are falsely marked as "out of order" TCP segments instead. (In the past, I've also seen some fast retransmissions marked instead as being normal retransmissions. I don't know whether that bug was fixed.)
Since my dummynet was not configured to reorder packets, and the Gustavus network has a simple enough structure that it shouldn't reorder packets, I would assume that our packet traces do not contain any genuinely out-of-order segments. As such you can find the total number of packets that are retransmissions simply by counting the total of those that are marked as "out of order" and those that are marked as either kind of retransmission. Actually, you don't have to do the counting, as Wireshark will do it for you.
The first step is to apply another display filter to show just those packets from the server that are marked as either retransmissions or out of order. (Packets marked as fast retransmissions are also marked as retransmissions, so they will be included.) This display filter can be expressed as
tcp.srcport == 80 && (tcp.analysis.retransmission || tcp.analysis.out_of_order)
Having done this, record the number at the bottom of the window indicating the quantity of displayed packets.
For the two trace files that are specific to you, you should clear any filtering you have applied and then examine the packets preceding and following each timeout. At a minimum, you need to determine how long each timeout lasts. For extra credit, you can also look at the packets retransmitted in the aftermath of the timeout to see how many of them seem to have been needlessly retransmitted. That will allow a more precise estimate of the packet loss rate.
For each of the 20 traces, use the elapsed time you determined along with the size of the downloaded file, 229174 bytes, to compute the "goodput", that is, the throughput of application-level data. (Raw throughput, which Wireshark can tell you, includes all the bytes transmitted, even if they are retransmissions or protocol overhead. We want to exclude those.)
One key question is whether the two loss rates result in significantly different goodputs. Looking at the numbers, what can you say qualitatively? Are those from one loss rate uniformly smaller than from the other, or is there some overlap? If there is overlap, do the two groups of numbers still seem to be clearly distinguishable, or might they all plausibly be coming from the same population? You may also want to look at the numbers graphically. For example, you could plot each of the 20 goodputs against a number from 1 to 20 indicating the particular packet trace's position in my randomized data collection sequence (which was b0, a0, b1, b2, b3, a1, a2, b4, a3, b5, a4, a5, b6, a6, a7, a8, b7, b8, b9, a9), color coding each point in the graph to indicate whether it was from an A or B trace. You can then see how different the vertical range of the points in one color looks from the vertical range of the points in the other color. Putting the points in order by data collection sequence also has the advantage that if there was some major confounding effect, you might spot a sign of it.
To get a quantitative handle on whether the goodputs might plausibly have come from a single population, you can use a statistical test. The simplest applicable test would be Fisher's exact test for 2x2 contingency tables. The idea is to identify which ten goodputs are the smallest ten and which ten are the largest ten. Call these "small" and "large" goodputs respectively. Then make a 2x2 table containing four counts: how many small goodputs were measured with loss rate A, how many small ones with loss rate B, how many large ones with A, and how many large ones with B. You can arrange this as follows:
|small A||small B|
|large A||large B|
Now take these four numbers and plug them into a program for computing Fisher's exact test. If the resulting two-tailed p value is very small, then you have strong evidence that loss rates A and B really do have different goodputs. You can find programs for this test on the web, in the form of interactive web pages.
Next, you can calculate the fraction of server-originated packets that are retransmissions, and the ratio of timeouts to server-originated packets. Are the two loss rates distinct in these regards? (You can test this in the same way as you tested for difference of goodput.) If they are, is the network with more retransmissions the same as the one with more timeouts and the same as the one with lower goodput, as one would expect?
You can get a crude approximation to what loss rates A and B might have been by using the number of server-originated packets that were retransmissions as an estimate of the number of lost packets. (This overstates the actual number of lost packets because after a timeout has caused a lost packet to be retransmitted, a few subsequent non-lost packets may also be retransmitted before the relevant ACK arrives.) Sum together from all the A traces the number of packets estimated in this way to be lost, and divide that by the sum over the A traces of the total server-originated packets. That will give you an estimate to loss rate A. Loss rate B can be estimated the same way.
For the two traces where you examined the timeouts in depth, calculate the total time spent in the timeouts. Suppose these timeouts were eliminated, but the traces were otherwise unchanged. How much higher would each of the goodputs have been?
If you chose to tackle the extra-credit opportunity, use your findings to calculate a more accurate estimate of the loss rates in your particular A and B traces. If multiple students take this option, we'll be able to aggregate their data to provide our best estimate of the A and B loss rates.
Be sure that your report does not assume the reader already knows what you did. You may assume reasonable background knowledge of networking and should refer to external sources of information (such as RFCs or program documentation) where appropriate. Present quantitative data clearly, using well formatted tables (with aligned decimal points) and graphs.