Probable congestion events on 2004-08-17 between 16:05 and 16:20 UTC

Initial OWAMP Summary View

Initially, Jeff saw a coordinated rise in latency variation on the "worst 10" page (which I don't think we captured). He sent out a message to the Abilene Planning list asking if there was any event. This contained snapshots of the OWAMP data.

The Denver to Kansas City link (in that direction) was the common link to all paths that saw variation.

Link Utilization

I took a look at SNMP data (a day later) and saw some spikes that looked suspiciously like TCP testing from Caltech to CERN. (Later, I was able to verify that such testing occured, although I could not obtain exactly what was being tested). Unfortunately, the 5 minute data for the time I was interested had already been folded into 30 min data for the week. However, we have two SNMP queries going on, one at 5 min intervals (and what you get when you click on the weathermap), and queries done by SNAPP at 1 minute intervals that is kept around for a while. I was able to look at the SNAPP data. Indeed, at 1 minute resolution, we're seeing 9 Gbps of traffic, so it's likely that there is congestion at finer time scales. [NOTE: the SNAPP data is in EST (UTC-5) instead of UTC. So, 11am EST = 4pm (16:00) UTC.] This utilization is happening just after noon EDT (daylight savings time is in effect). So the test was mid-day for most of the Eastern US, and there was significant background traffic. Looking at a wider time range, it appears that TCP is ramping up, although there was no TCP sawtooth during the time in question (and it would not have oscillated that quickly anyway):

Fine-grained OWAMP Analysis

Jeff kindly gave me the raw OWAMP data for the time in question. The text version (~500K, 9000-ish entries) is available if you wish to look, the format of the line is

<coarse-seqno>.<fine-seqno> <date+time> <delay in millisec> <error in sec>

I massaged it a bit, imported the data into Excel (2MB XLS file) and then created a time-series graph. There were seven deviations from the minima, and I zoomed in on each of them. In each graph, the X axis is seconds since 16:00 UTC, and the Y axis is the delay in milliseconds. The raw data was scheduled by a poisson process with average 10/sec.

A few interesting observations:

We are looking for other interesting observations, or root cause thoughts. Send your ideas to Matt!

Last update: 3 Sep 2004, Matt Zekauskas