Friday, May 27, 2011

SNMP Polling vs. Traps

SNMP has been around for decades.  Many manufacturers build SNMP agents into their products so NMS nodes can monitor their status.  There are two ways SNMP can be used to monitor a device: 1) active regular polling by the NMS to the device and 2) traps sent by the device to the NMS.  Unfortunately, many people seem to only know about one or the other method.  I'm approached regularly with requests to monitor a particular set of devices via SNMP.  I ask what metrics they'd like to monitor (approaching from method #1) and they usually respond with the MIB and say, "We want to monitor everything."  After a simple discussion about what they expect should happen, their requests usually come down to wanting NV to be able to receive any and all traps defined in the MIB.  Oh the humanity.

First of all, any SNMP receiver can receive any trap from any device, usually with minimal configuration (having to do with firewalls and ACLs).  Whether it can do something with that trap is something else entirely.  NV, out of the box, will receive traps from any device that has UDP connectivity with NV.  Using the MIBs already compiled, NV will try to interpret the trap to understand what the trap means.  If you want better interpretation of the trap, compile in a MIB that describes the trap.

However, traps are not the best way to go.  Active polling is better for a couple reasons.  Let me draw an analogy: traps are how a typical college kid would communicate with his parents, only calling them when he needs something.  While this does count as communication, the parents aren't very well informed of the kid's progress in school.  Active polling is like a good son who is on a call with his parents every Sunday afternoon, filling them in on his progress on a regular basis.  Obviously, the good son is maintaining better communication with his parents and will be more likely to get the help he needs when or before he needs it.
Active polling has several advantages: 1) since active polling metrics can be stored, historical analysis can be performed, 2) given the historical analysis, thresholds can be set just above 'normal' values to determine when a problem may be forming, before the problem becomes apparent, 3) active polling usually involves some sort of discovery process, which can help identify devices that come online that might not have been configured with the proper SNMP target for traps, and 4) with NV in particular, configuring notifications and alarms based on active polling is much easier and straightforward than on traps.

Traps do have the advantage that if something is wrong that isn't being monitored, the device can still send out a trap indicating that.