Friday, March 30, 2012

Understanding SuperAgent Network Regions

I've found that many people don't understand the concept of regions in a network definition in SuperAgent.  Given the power of a region to make defining networks easier and give more granular reports, I'm actually quite surprised that it hasn't been evangelized a bit more.  So, here's my explanation:

SuperAgent organizes data according into buckets.  SA could store the analysis data for every single client IP address in its own bucket in the database, but that's kind of the point of MTP.  Also, having reports that are that granular are only helpful if you already know where the problem is.  In addition, if you think about it, storing the analysis of two client IP addresses in two individual buckets in the database doesn't make sense if those two client IP addresses are connected to the same switch, which is using the same router to get to the WAN which is coming into the same network hardware in the datacenter.  If the two clients are using all the same network hardware, measuring two different network round trip times for those two clients is virtually impossible.  Think about it, the only thing that is different is the client's NIC, which doesn't really affect SA metrics, due to modern technologies like TCP offload engine (TOE) which bring the ACK turn around time on the NIC down to sub-millisecond.

Ok, so there's the reason to summarize networks according to the network path.  If a bunch of IP addresses use the same network path to get back to the servers monitored by SA, there's not much value in storing the analysis on a per-IP basis.

However, for groups of IP addresses that do use different network infrastructure, it is imperative to separate them so that the differentiating network hardware can be isolated and therefore identified and troubleshot (troubleshooted?).

Therefore SA provides the ability to define client networks.  Each client network instructs SA how to group IP address blocks together and treat them as one unit for analysis and storage.  Each network definition should only contain the IP addresses that share all of their network infrastructure.

This is nice because it cuts down on the amount of configuration required in SA.  To illustrate, let me give an example.  A US company has decided that its IP address scheme is to allocate an entire /10 block of IP addresses to each time zone (e.g. 10.0.0.0/10 for Eastern, 10.64.0.0/10 for Central, 10.128.0.0/10 for Mountain, and 10.192.0.0/10 for Pacific).  It then decides to allocate an entire /19 block of IP addresses to each site within that time zone (e.g. 10.30.0.0/19 for NYC, 10.74.32.0/19 for Chicago, & 10.200.128.0/19 for LAX, among others).  This is actually really easy to configure in SA.  The networks would be defined in SA as such:
Network NameNetworkMask
EST10.0.0.010
CST10.64.0.010
MST10.128.0.010
PST10.192.0.010
NYC10.30.0.019
......19
CHI10.74.32.019
......19
LAX10.200.128.019
......19
Each of the time zone IP address blocks should be configured so that any clients in the time zone that don't match a site definition get categorized somewhere.  Any traffic showing up in those networks is an indicator that a site is missing.  On a side note, the time zone networks could be given their own network type and special, tighter thresholds could be applied so that incidents trip immediately for any amount of NRTT.  A special network incident response could be setup to send an email to the SA admin to notify him/her that traffic has been seen on a time zone network (indicating a site network definition that is either missing or incomplete).

While this is great, the network administrators at the US based company decided that a standard of 32 VLANs should be implemented at every site.  Each VLAN should be a /24 subnet and each VLAN has a standard use (floor 1, floor 2, floor 3, printers, servers, wireless, etc.).  With the networks above defined in SA, the network administrator won't be able to differentiate between bad performance on a wireless VLAN and bad performance on a wired VLAN.  At this point the administrator has two options: 1) either he can rebuild all the network definitions defining every single /24 subnet or 2) he can define 32 regions in each of the site network definitions.  The better option is #2.  Here's why:

Defining 32 regions on a /19 network definition in SA is equivalent to defining all 32 /24 sub-subnets within that /19 network.  It's shorthand.  Once defined, the /19 network definition will have a plus sign (+) next to it.  When clicked, the admin can see that SA actually has 32 networks defined within that /19.  The nice thing is that they are all grouped together according to site (/19 network).

One disadvantage is that the name originally assigned to the /19 network is the same one originally assigned to all the sub-subnets (regions).  This however can be overcome by expanding a /19 (hitting the plus sign) and renaming the VLANs as necessary.  Each region can be named individually.  The way to get around this is to use option 1 and create a CSV containing all the /24 networks each with a site name prefix and a VLAN designator (name and/or VLAN number).