Wednesday, March 2, 2016

Rate, Volume, Utilization, and Parsecs

But wait, the parsec is not a unit of time, but a unit of distance! Wait, what? All arguments aside about how the Millennium Falcon could make the Kessel run in a shorter distance through enormous gravitational shears, knowing your unit is extremely important.
I work in network monitoring and one of the main reports my tools provide measures how much an interface is used. Because the tool is better than poo, it presents the utilization in several different units. First, let's review the units. Each of these units are SI units, so standard SI prefixes apply when talking about larger bases of the base unit:
  • Bytes - measures the total number of octets that were transmitted (or received depending on p.o.v.)
  • Bits per second - measures the number of 0's and 1's that were transmitted (or received depending on p.o.v.) in a single second.
  • Percent utilization (%) - measures the percentage of a period of time that the interface was transmitting (or receiving).
Let's break it down.

Bytes

This one is pretty simple and is referred to as VOLUME. It's simply the total number of Bytes transmitted (or received) during the measurement window. An SNMP polling station would poll the octet counter at a regular interval. Every time the octet counter is polled, the delta between the previous poll results and the current poll results represents the total number of Bytes during the measurement interval.
V = B1 - B0
Polling too frequently will result in small values. Whenever rollups happen, the individual data points should be summed (integrate over the rollup interval). As long as rollups are done that way, the poll rate is less consequential.
Rollover is accounted for by assuming that a lower number than the previous measurement is caused by rollover and the new measurement (measured from 0) is added to whatever remained between the previous measurement and the max limit of the counter.

Layman's example

This is similar to tracking how many miles a car travels. You simply take a reading of the odometer before beginning a trip and another at the end of the trip. The difference is the total miles the trip entailed. You could take readings more often. You'd just need to add up all your measurements at the end of the trip to get the total for the trip.

Bits per second

Bits per second is a simple count measured over a unit of time, making it a RATE. It counts the number of bits that went through the interface, then normalizes the count over a standard unit of time, the second. It is calculated like this:
R = (Δ bits) / (Δ time)
That is, you take the total number of bits and divide it over the total time of the measurement. This is usually done through SNMP by looking at the octet counters. The NMS will poll the sysUpTime and the octet counters at a certain time (T0 and B0). It will then poll the sysUpTime and octet counters at some other time in the future (T1 and B1). The RATE is calculated by dividing the difference between these two measurements (and adjusting the octet counters to get it into bits instead of bytes 8 bits = 1 Byte):
R = 8 (B1 - B0) / T1 - T0
The resulting unit is bits/second and represents an average of the number of bits transmitted per second over the measurement interval (T1-T0). When doing the rollup, average is the most common descriptor. In addition, min, max, standard deviation, variance, and 50th, 75th, and 90th percentiles would be useful.
If you're already gathering VOLUME, you'll notice that B1 - B0 used in the RATE calculation comes from the volume calculation. That's on purpose and is why it is said that RATE is derived from the VOLUME measurement. In fact, if the polling interval is fairly regular, the rate can be said to be approximately linearly proportional to the volume.

Layman's example

This is not really any different than measuring the speed of your car while on a trip. You take a reading of the odometer and the clock at the beginning of the trip and again at the end of the trip. The difference in miles, divided by the total time of the trip (in hours in this case) will give you an average speed in mph. You could increase the resolution of your measurements by taking a reading and performing the calculation every 5 minutes. This would give you a data point describing the average speed for every 5 minutes of your trip.

Percent Utilization

Percent UTILIZATION measures how much capacity is used and is reported in percentage of the total capacity available. This is calculated by dividing the current RATE by the total rate capable by the interface. Alternatively, it could be calculated by dividing the VOLUME by the total volume capability of the interface. The latter requires a bit more derivation, so most use the former.
This metric requires knowledge about the interface's capabilities. This is usually obtained by polling the bandwidth statement (ifspeed) of the interface (1.3.6.1.2.1.2.2.1.5), which is in bits per second (bps). Once obtained, the percent UTILIZATION can be calculated like this:
U = 8 (B1 - B0) / T1 - T0 / ifSpeed
You may notice that a part of this formula looks the same as the RATE calculation. It is. Simplifying the formulas:
U = 8 (B1 - B0) / T1 - T0 / ifSpeed * 100
R = 8 (B1 - B0) / T1 - T0
U = R / ifSpeed * 100
Since the UTILIZATION formula involves dividing a rate (in bps) by a speed (in bps), the result is unitless. This means that the unit can be thought of as % (percentage). Rollups of UTILIZATION should be treated the same way as rollups for RATE. You should also notice that the percent utilization should be linearly proportional to the rate, given a constant bandwidth capacity of the interface.

Layman's example

This calculation is similar to calculating how close a driver is to the speed limit. By dividing the current speed (derived using the formulas above for speed) by the total allowable speed, you can calculate what percentage of the limit the car is currently traveling. When driving, moving at 100% of the speed limit is actually good. You are actually making the most of the available resource. The only time 100% utilization is a problem is when you need to do something else with that speed (i.e. other cars on the road not travelling at the same speed). The same actually holds true for networking. Utilization of 100% is not bad until you need some percentage of those resources for another task.