Requirements of Network Monitoring for the Grid

Robin Tasker (r.tasker@dl.ac.uk)
22 February 2001

1. Scope of Network Monitoring

Network monitoring will be used in distinct ways with respect to the Grid. It will be used to provide background measurements of network performance which will be of value to network managers and those tasked with the provision of network services for Grid applications. In addition it will be used to better understand the impact of Grid applications on the operation of networks. This will include the storage of performance figures for subsequent use by the application to adjust their behaviour accordingly.

In essence these monitoring activities may be separated on the basis of time. Background monitoring looks over days, weeks and months at the behaviour of the network whilst immediate monitoring provides a snapshot of existing conditions within the network. One can envisage that the former will be used to manage the network and to ensure sufficient provision, while the latter will be used either directly by end users wishing to run a particular application or more likely by the application itself to adjust, in real-time, its usage of network resources.

The subject of network monitoring is well documented and many communities have  active programmes in place. Of particular relevance is the work undertaken by the HEP community and within Internet2. The following references are of value,

http://www.slac.stanford.edu/comp/net/wan-mon.html

http://www.caida.org/home/

and present a number of techniques for monitoring and the visualisation of the resulting data. The following reference provides a detailed comparison between a variety of network monitoring techniques,

http://www.slac.stanford.edu/comp/net/wan-mon/iepm-cf.html

 

2. Metrics for Network Monitoring

The IETF IP Performance Monitoring (IPPM) WG have produced RFC 2330 which describes a framework for IP performance metrics. This RFC provides a valuable discussion of the issues relating to network monitoring metrics and can be found at

http://www.ietf.cnri.reston.va.us/rfc/rfc2330.txt?number=2330

The following network monitoring metrics are considered and a general discussion of their use can be found at

http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html#variable

2.1 Packet Loss

Packet loss provides a good measure of the quality of the route between end points. However the manner in which applications can deal with such loss will vary greatly depending upon their use of TCP or UDP and the particular requirements of the application.

2.2 Round Trip Time (RTT)

The RTT is the time taken to traverse the path from the source to the destination and back. Formally, given a packet p, the time at which the last packet byte departs from the source t(D), and the time at which the last packet byte arrives at the packet destination t(A), RTT = t(A) - t(D)

The RTT is the sum of the propagation time between the end points plus the queuing delays introduced at each hop along the path between the sites. It is therefore characterised on the distance between the end points of the route, the number of router hops on the route and the delay encountered at each hop.

It is a "there and back" measurement which has the benefit that timing measurements are confined to the source, i.e. there is no need for clock synchrony between source and destination. On this basis it is a simple metric.

2.3 One-Way Delay

The one-way delay metric has been developed within the IPPM WG of the IETF and is dealt with comprehensively in RFC 2679, A One-way Delay Metric for IPPM,

http://www.ietf.cnri.reston.va.us/rfc/rfc2330.txt?number=2679

In essence, one-way delay measures the path between source and destination and is the sum of the propagation delays of the data links and the delay introduced at each router hop on the path. One-way delay measurement requires external clock sources (like GPS or NTP - depending on the precision required) for synchronization and the co-ordination of source and destination processing to make the measurements.

One-way delay measurement has the benefit of providing the measurement of a specific path through the Internet and recognises that asymmetric routing commonly occurs within the Internet. However for an application, the communication between it and the remote client is what matters and  regardless of the particular routes taken for the traffic, it is the "there and back" characteristics that matter.

2.4 RTT / One-way Delay relative to a Given Route

The RTT / One-way Delay is dependent upon the route taken across the network and such route variations are likely to introduce differing transit delays. For example, a provider may choose to route traffic on a particular link because of capacity and take no account of distance and associated delay; or because of fault conditions route flapping will have an indeterminate effect of transit delay.

2.5 Variation in RTT (frequency distribution of RTT)

A plot of the frequency distribution of rtt measurements can be used to provide a reasonable estimation (the inter-quartile range) of jitter, see

http://www.slac.stanford.edu/comp/net/wan-mon/resp-jitter.html and

http://icfamon.dl.ac.uk/cgi-bin/frequency.pl?sites=ns2.slac.stanford.edu&style=linespoints&days=1

for examples of such work and rationale. The benefit of this approach is that variation in RTT can be readily computed using the data gathered is the simple measurement of RTT.

2.6 Jitter

Jitter is the variation in arrival times of successive packet from a source to a destination. It is formally defined by the IETF as the "instantaneous packet delay variation" (IPDV) and is the difference experienced by subsequent packets, I and I+1, on a one-way transit from source to destination.

The measurement of one-way delay and derived IPDV provides a means whereby a more rigorous characterisation of the Internet can be developed

2.7 Volume (number of (Grid) bytes exchanged)

This is the measure of the total number of bytes exchanged per specified Grid activity. Traffic volume estimated for each transaction on a daily/weekly/monthly basis would provide value.

2.8 Per Flow Application Throughput

This metric defines the throughput (byte/s) measured for a specific Grid application between specified end-points, i.e. DA/SA/Port. Typically it will be based on the measurement of the actual data transferred during the exchange, i.e. a "passive" measurement, and not on additional (test) data inserted into the network.

However the capability for "active" measurement of throughput will be required through the injection of test traffic using applications like, for example, netperf. This will allow knowledge of network capability to be recorded and problems identified ahead of use by real GRID application traffic.

2.9 Aggregate Network Throughput

This metric defines the aggregated throughput within the network between source and destination end points. This may measure the current utilisation as a rate (volume/time) or as a proportion of the total capability within the path across the network.

 

3. Output from Monitoring

It is envisaged that each Grid site will maintain a common set of monitoring tools and associated data which is readily accessible to the HEP community. These tools will provide detailed information on the defined metrics so that both the immediate and more long-term needs of Grid users and network operators are met. The precise style of output will depend on the tools selected but the availability of raw data (for further analysis etc) and its visualisation will be crucial. The synthesis of these tools to provide the Internet2 concept of the network weather map needs to be investigated.

 

4. Available (Existing) Monitoring Tools

4.1 Basic Tools

The well known tools - traceroute (NIKHEF), pathchar, netperf, etc -will be used to provide the measurement of the basic metrics.

Many network monitoring tools make use of ICMP which may be subject to different traffic management strategies from TCP- or UDP- based traffic. For example, under congestion conditions ICMP traffic will often be preferentially dropped or dealt with at a reduced priority. However work has been published comparing the use of Ping and TCP Syn/Ack packet exchanges to characterise a specified network route. The results were broadly equivalent which suggests that to a first approximation the use of  ICMP based tools provide reliable measurements.

4.2 PingER

See http://www-iepm.slac.stanford.edu/pinger/ and related links.

The use within the HEP community of PingER is well established. It is used to measure the response time (round trip time in milli-seconds (ms)), the packet loss percentages, the variability of the response time both short term (time scale of seconds) and longer, and the lack of reachability, i.e. no response for a succession of pings.

PingER data is stored locally to allow Web-based access to provide an analysis of packet loss, rtt and the frequency distribution of rtt measurements in both graphical and tabular format. The data is also collected centrally to allow site-by-month or site-by-day history tables for all (or selected) sites as seen from the local monitor point.

The PingER project has a well established infrastructure involving hundreds of sites in many countries all over the world and is particularly focused on the HEP/ESnet communities.

There is clearly a danger that through ICMP rate limiting or ping discard policy that this approach will either give invalid results or no results at all. This is well recognised but to date comparison between PingER and Surveyor and the RIPE box, 

http://www.slac.stanford.edu/comp/net/wan-mon/surveyor-vs-pinger.html

suggest that the concerns are unfounded. However this offers no guarantees in the future.

4.3 Surveyor

See http://www.advanced.org/csg-ippm/ and related links

Surveyor is a measurement infrastructure that is being currently deployed at participating sites around the world. It is based on standards work being done in the IETF's IPPM WG. Surveyor measures the performance of the Internet paths among participating organizations. The project is also developing methodologies and tools to analyze the performance data.

One-way delay and packet loss are measured for each of the paths by sending time stamped test packets from one end of the specified path to the other. The one-way delay is computed by subtracting the timestamp in the packet from the time of arrival at the destination machine. 

Test packets are UDP packets of size 40 bytes (including the IP and UDP headers). The test packets are sent as a Poisson stream, usually at an average rate of two packets per second. The lost packets are treated as having an "infinite" delay. The values are accurate to within 50 microseconds.  For each of the paths, records consisting of the tuple

<time-the-packet-was-sent, one-way-delay-observed>

are kept in a central database. In addition, for each of the paths, for each day, summary statistics are computed for each minute in the day and are stored in a "summary database". The metric documents define several summary statistics. Currently, the following summary statistics are stored, 

Minimum delay during the minute
50th percentile delay during the minute
90th percentile delay during the minute
percentage packet loss during the minute 

Using the summary database, summary reports are generated every day as plots showing these statistics. Currently, there are three types of plots,

Percentiles of delay 
Percentage loss 
Histogram of delay 

4.4 RIPE NCC Test Traffic Measurements

See http://www.ripe.net/ripencc/mem-services/ttm/index.html

The goal of the Test Traffic project is to provide independent measurements of connectivity parameters, such as delays and routing-vectors, in the Internet. The project implements the metrics discussed in the IETF IPPM working group (see RFC's 2330, 2679, 2680 and related documents). Work on this project started in April 1997 and over the last years, it has been shown that the setup is capable of routinely measuring delays, losses and routing vectors on a large scale. The Test Traffic project is being moved to a service offered by RIPE NCC to the entire community. 

4.4 MRTG

The Multi Router Traffic Grapher (MRTG) is a tool to monitor the traffic load on network-links.  It is described here

http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/

MRTG generates HTML pages containing GIF images which provide a LIVE visual representation of this traffic. MRTG consists of a Perl script which uses SNMP to read traffic counters of routers, logs the traffic data and creates graphs representing the traffic on the monitored network connection. These graphs are embedded into webpages which can be viewed from any modern Web-browser.

4.5 TracePing 

Traceping uses packet loss as its metric of network quality. It has been found, in every case investigated, that the variations in packet loss recorded by traceroute and ping, and hence by Traceping, reflect changes in the real performance experienced at the user level. Traceping is described here,

http://av9.physics.ox.ac.uk:8097/www/traceping_description.html 

No measurements or estimates are made of either the systematic or random errors on the packet loss data. The numbers should simply be interpreted as qualitative indicators of the state of a network connection: the higher the numbers the lower the quality.

Currently traceping is a VMS specific tools so its value is strictly limited although there is a proposal to make it more generally applicable.

4.6 iperf and Throughput Measurements

Iperf is a tool for measuring maximum TCP and UDP bandwidth and is described here,

http://dast.nlanr.net/Projects/Iperf/

A paper showing the results of using iperf between several HEP institutes can be found at

http://www-iepm.slac.stanford.edu/monitoring/bulk/

4.7 Netflow and cflowd

cflowd was developed to collect and analyze the information available from NetFlow flow-export. It is described here,

http://www.caida.org/tools/measurement/cflowd/

It allows the user to store the information and enables several views of the data. It produces port matrices, AS matrices, network matrices and pure flow structures. The amount of data stored depends on the configuration of cflowd and varies from a few hundred Kbytes to hundreds of Mbytes in one day per router. 

 

5. Proposals

In the first instance the following six sites,

Bologna, CERN, Daresbury, Lyon, NIKHEF and RAL

will provide a "pre-testbed" mesh of sites to install network monitoring equipment and software so that an understanding and experience of associated issues may be gained. Furthermore the network monitoring products will be made more generally available to inform the discussion on the precise requirements for the Data GRID. Such input will allow WP7 to tailor the network monitoring tools and their products more appropriately and also to begin the process of making the required metrics available, in the correct format, to users of the GRID.

A further purpose of the pre-testbed approach is to assess the applicability of the different techniques to the scale of the emerging testbed. Inevitably it will be the analysis, presentation and availability of the products of network monitoring that will determine the best approach within the DataGRID.

It is expected that as experience and understanding (from the perspective of GRID requirements) is gained in network monitoring, so the number of sites included will increase with the intent eventually to cover all sites involved in the GRID testbed developments.

For each metric and hence monitoring tool selected for use, the specification of the statistical analysis adopted (e.g. average, frequency distribution, standard deviation etc.) and the recommended frequency and duration of measurement will be developed and  recommended. By doing so, the deployment of specific tools and their output will provide coherence across the GRID infrastructure and make the comparison of output more straightforward.

To carry out this work each site will make a dedicated computer available which will  be used to trial the use of  network monitoring tools. The location of both this computer and the RIPE NCC TTM box will be as close as possible to the GRID applications for the particular site.

WP7 agreed that the deployment of the following monitoring tools would take place with the intention of fully meshed operation by the end of March 2001.

5.1 Background Network Monitoring

5.1.1 RIPE NCC Test Traffic Measurements

The RIPE NCC TTM box will be deployed by each of the six pre-testbed sites identified. Experience from their operation will be used to inform the of their general applicability across the testbed. 

The RIPE equipment will by default collect data from all other such boxes on the network. It is proposed here to provide a view of the network characteristics of the set of sites within the pre-testbed and their interconnections, i.e. a full mesh of sites. Less than full mesh monitoring would provide a limited characterisation of the network and if this approach is to be recommended in the testbed and DataGRID, issues relating to scale need to be understood.

5.1.2 PingER

It makes sense to make use of existing expertise within the HEP community with the PingER toolset to provide long term monitoring. Many HEP sites already make use of this software and further deployment is both straightforward and of low cost. The identified sites will co-ordinate their deployment of PingER to ensure that the monitoring represents a full mesh of connectivity and that they have a simple means of presenting  the data so collected to the community.

In the US, PingER is being used to provide network monitoring for the Particle Physics Data Grid collaboration,

http://www.cacr.caltech.edu/ppdg/ and  http://www-iepm.slac.stanford.edu/monitoring/ppdg/

and direct collaboration with this effort would seem to be appropriate.

5.1.3 Throughput Measurements

WP7 recognise that the measurement of throughput remains very much an open issue. However, in the first instance the sites will attempt the use of iperf to measure throughput with respect to the number of tcp streams and the tcp window size. The results of such trials will be made available to the community as a demonstration of capability. Sites are encouraged to trial other tools across this pre-testbed and to make the output readily available.

5.2 Immediate Network Monitoring

In the first instance a single view/access point of the available tools needs to be produced to allow a GRID user access to determine the "health" of the network. Such a snapshot of the network will likely include route information between specified end points; the characterisation of the network using, for example, pathchar; and the means of measuring throughput via, for example, MRTG. The pre-testbed sites are encouraged to develop this concept to demonstrate capability and to allow WP7 to further refine the ideas based upon their experience and input from the users of these products.