Friday, July 5, 2019

Discovering Enumerated Properties

tl;dr - the scripts are here and here.

In the same vein as the previous post, this post will talk about SNMP enumeration. While the previous post talked about polling enumerated values and how to display them as intuitively as possible, this post will talk about polling more static values and using them as properties.

There are two levels of properties that can be obtained via SNMP: device level properties (that pertain to the whole device) and instance level properties (that pertain to a particular thing on the device, of which there may be more than one).  Polling those properties is easy; this post will go over how to improve the quality of the data stored so that the data can be intuitively used.

Polling properties vs. data points

Polling properties is easy. Knowing which properties to poll as properties and which to poll as data points is a separate discussion entirely. Suffice it to say that things that represent a characteristic about an object should be properties and things that represent a behavior should be data points. Depending on the tool, only data points can be alerted upon, so that may influence a decision to make something that would normally be a property also a data point.  In other cases, sometimes data points can't be used to influence enablement of monitoring; so sometimes data points need to be properties too.
Assuming you're past the point where you've looked at the MIB and figured out which OIDs should be properties and which should be data points, the next thing to look at is whether or not the OIDs you will be storing as properties have enumerations.

Enumerations

An enumeration is used in SNMP to keep the simple in Simple Network Management Protocol. Instead of passing back a string comprised of ASCII characters, a map is created that connects a meaningful string to a single integer, which is passed back to the NMS. Passing back a single integer is much simpler than passing back a string of characters of varying length.
For example, the admin status of an interface is found at 1.3.6.1.2.1.2.2.1.7. It's not a particularly good example of a property since it lends itself more to a data point, but having it as a property can allow for advanced filtering which may only be available using properties.
ifAdminStatus OBJECT-TYPE
    SYNTAX  INTEGER {
                up(1),       -- ready to pass packets
                down(2),
                testing(3)   -- in some test mode
            }
    MAX-ACCESS  read-write
    STATUS      current
    DESCRIPTION
            "The desired state of the interface.  The testing(3) state
            indicates that no operational packets can be passed.  When a
            managed system initializes, all interfaces start with
            ifAdminStatus in the down(2) state.  As a result of either
            explicit management action or per configuration information
            retained by the managed system, ifAdminStatus is then
            changed to either the up(1) or testing(3) states (or remains
            in the down(2) state)."
    ::= { ifEntry 7 }
You can see in the definition of the OID that it has a custom syntax that allows for one of three values. When this OID is polled, either a 1, a 2, or a 3 is returned. It's up to the NMS to interpret the different values to understand the meaning. If the NMS tool has MIB compilation, this may have a certain level of automation. If it doesn't, interpretation is up to you. Obviously, storing a 1, 2, or 3 as a property could be sufficient, but it would be better to store the interpreted meaning making use by humans easier and more intuitive.

Interpretation of an instance property enumeration

So how do we interpret this? It's actually pretty easy, but it involves some scripting to process the returned data. It also involves some research into the MIB to find out all the meanings. Let's start with a simple case of interfaces. In most cases a MIB will contain a "Table" OID with an "Entry" child OID containing all the instances with whatever OIDs go along with the instances. In the case of interfaces (ignoring the ifXTable), the table is called "ifTable" at 1.3.6.1.2.1.2.2. You'll see that there is a child OID called "ifEntry" at 1.3.6.1.2.1.2.2.1.  This kind of table typically has an index along with perhaps some properties and data points as columns. Each row is an instance. We're going to ignore the data points for now and focus on the OIDs that would do well stored as properties. MTU(4), speed(5), and MAC address(6) are good items to store as properties. They require no interpretation. However, ifType(3) and ifAdminStatus(7) require some interpretation to be useful.
LogicMonitor's multi-instance datasource can be configured to use a groovy script (yes, it's a real thing) to do auto-discovery, which is the mechanism that discovers poll instances and sets properties per poll instance. The concept is pretty simple, use the groovy SNMP libraries to retrieve the data, use groovy to interpret the data, then just print the data to standard output, one line per poll instance.
I recently wrote a script to do this. It's all self documented with explanations and sample output and everything. Some points to consider at the following lines:
  1. This line defines the address of the Entry table. Everything we will be polling happens to live under this branch of the OID tree.
  2. This is where we define which column of the ifEntry table contains the name that we should be using for each of our instances.
  3. This line shows the information from the MIB added to the script so that the script can interpret the meaning from the returned value for ifAdminStatus
  4. This line shows the information from the MIB added to the script so that the script can interpret the meaning from the returned value for ifOperStatus
  5. This line shows that we're still polling ifMtu as a property, but there is no interpretation available for the value. We could alternatively put ["1500":"Default (1500)"] instead of [:] to tell the script to add some meaning to the most common value of MTU
  6. ifPhysAddress is the MAC address and needs no interpretation
  7. This line shows an interpretation that isn't defined in the MIB. Instead of just passing the raw value through, we can provide our own interpretation of the speed to give some more intuitive values for common speeds. If the speed of the interface isn't in our map, the speed itself will be stored as the value. If it does happen to match on eof
  8. This is where we start to define the enumeration for ifType. Turns out ifType has over 200 different enumerated values. Each of these is defined in the script so that the proper type name can be stored as a property.
  9. Here's where you can see some sample output against a device here in my house. Notice that there's one line for each port (both physical and logical).
  10. This line is a good example showing the interpreted values of ifAdminStatus, ifOperStatus, ifSpeed, and ifType.

Interpretation of device level properties

Polling and storing device level properties is a bit simpler mainly because looping through instances is not required. We do have to provide each OID, the name we want the property stored under, and the interpretation, if any. All of this comes from the MIB. The script to do this is here. This example comes from the mGuard MIB, which is from some work I did recently. However, the OIDs can be replaced with any OID from any MIB (notice there's not a baseOID), as long as the OID returns a single value because the script does an SNMP get on that OID.

Conclusion

That's about it. These two scripts can be used to add real meaning to device and instance level properties. The only things that have to change are the data that come from the MIB itself. Perhaps one of these days I'll get around to writing a MIB parser that will output this information for all OIDs in the MIB. Yeah, when I have time.

Tuesday, July 2, 2019

Visualizing Status Codes

If you follow me on LinkedIn, you will have noticed that I changed jobs and moved from Houston to Austin. I'm now working for LogicMonitor as a Sales/Monitoring Engineer. It's a great job and I'm enjoying it a lot more than previous jobs. One of the advantages of this job is that I should have much more opportunity, desire, and content for blog posts.

If this is your first time here, know that this blog is not written for you. It's written for me. I increasingly need more and more reminders of how to do things. That goes especially for things that I devise since no one else knows it unless I tell them. This blog is primarily a place for me to keep those things written down.

Anyway, on to this blog post. LogicMonitor monitors IT infrastructure. After collecting data through various mechanisms, it stores the data in a big database in the cloud and then provides a cloud hosted front end website to display the data. Part of the display is graphs. Many times, the metrics being graphed lend themselves to being plotted on a Cartesian coordinated graph. However, sometimes the metric being polled is a status code. A good example of this is license status on Sophos' XG Firewall. This metric is found at .1.3.6.1.4.1.21067.2.1.3.4.1.
asSubStatus OBJECT-TYPE
    SYNTAX          SubscriptionStatusType
    MAX-ACCESS      read-only
    STATUS          current
    DESCRIPTION     " "
    ::= { liAntispam 1 }
The syntax is "SubscriptionStatusType", which is an enumerated type meaning that only a number is returned, but that number has a meaning depending on the different values returned. Looking at the syntax definition in the MIB will help illustrate:
SubscriptionStatusType ::= TEXTUAL-CONVENTION
        STATUS            current
        DESCRIPTION       "enumerated type for subscription status"
        SYNTAX INTEGER {
                 trial          ( 1 ),
                 unsubscribed   ( 2 ),
                 subscribed     ( 3 ),
                 expired        ( 4 )
        }
So each different value returned indicates a particular state of the license subscription. It's not like a percentage where 100% is good and 0% is bad and there might be values in between. It's not like a rate, where a high number is fast and a low number is slow. It only has discreet values and values in between don't actually have any meaning.
Normally, without putting in much effort, someone might easily create a graph that just plots this number, putting time on the x-axis and the value retuned on the y-axis. This results in what you see here:
As you can see, it's not very helpful. It's a flat line because the status has been the same for the entire time range. That's ok. However, there's no real indicator of meaning. Some effort was made to add the meaning to the data point description (which appears in the tooltip). However, the enumeration is so long that it doesn't really fit in the tooltip. What does a 3 mean? Also, it's not illustrated here, but what happens if the value changes from 3 to 4? There would be two flat lines, one at 3 before the change and one at 4 after the change. But what would be shown at the change? Would it be a vertical line from 3 to 4? Would it be slightly slanted? Also not illustrated here, but what happens when larger timeframes are chosen and values are aggregated together (most often using an average)? Imagine that line at 3 that transitions to 4. What if that happened in middle of the quarter and you viewed it at the end of the quarter? If this status was polled every hour, that would mean 2190 data points to display! That's too many. Almost every graphing solution would attempt to decrease the data points by grouping points and averaging every group. In the case of a quarterly timeframe, it might simplify by averaging all data points for a single day together. This could be fine for most days, except for the one where there was a change. That would show an average of 3's and 4's, yielding a value of 3.5. WTH does 3.5 mean? It gets worse if you have a 2 that transitions to a 3 which then later transitions to a 4. You could end up with an average of 3, indicating no problem at all!?!?  It's not intuitive; and graphs need to be intuitive.

So, what do we do?

Well, we might be tempted to normalize the data. This is actually a very good idea. Let me explain: normalizing the data transforms it into a scale that is more intuitive. For example, we might say that we will normalize the data using the following rules:
  1. A value of 3 is good, so we'll call that 1
  2. Any other value is bad, so we'll call that 0
Pretty cool. That's a pretty good one. Any time everything is ok, we would plot a 1. Any other status is undesirable and we would plot a 0. Transitions are still ugly and potentially troublesome. If we make one tweak, it could allow us to put some context around the resulting values. What if we changed it to 100% instead of 1 and 0% instead of 0? If we did that, we could actually put some additional meaning behind the resulting values. Thing about it, if everything is good, you're plotting 100%. What does that mean? It means that for 100% of the timeframe displayed, the status was good. If the status is 4, we'd see a line down at 0% meaning that for 100% of the timeframe displayed, the status was not good, or conversely: the status was good 0% of the timeframe. We could also create another data point which is the inverse of our normalized data:
  1. If value is 3, plot 100, else plot 0
  2. Plot (100 - the value from above)
Doing this would also let us use a more intuitive graph called a stacked area graph. We could plot our normalized data using a pleasant color like blue or green and plot the inverse data using a warning color like red or orange. This would give us two series of data that compliment each other and when plotted as a stacked graph would look like the second graph here (the first graph is a status code plot for reference):
See how much more intuitive that is? The first graph in the above picture is a plot of the raw status code. How easy is it to know when things aren't good (without reading the axis labels)? Doable but not instantly intuitive. Imagine you had 12 of these kinds of graphs on a single dashboard on a wall. Would you want to take the time and effort to read the axis labels of each one to know how things are going? No. Now look at the second graph. It's pretty easy to tell that there were two different problems between 8pm and 11am and 1pm and 5pm.  We could even remove the y-axis labels and you'd still probably be able to tell with a glance how things are doing. Imagine 12 different graphs like this. How easy is it to see if there's a problem? Just look for the red!

There are only two drawbacks. Have you noticed? We see that there are two problems, but are they the same problem? Actually, they're not. Also, are they the same severity of problem? The morning problem is that status goes from "subscribed" to "expired". The afternoon problem is that the status goes from "subscribed" (did you notice that it returned to a good status at noon?) to "trial". The morning problem is worse than the afternoon problem.  There's a way to visualize this so that it all looks good. Let me explain:

Essentially we want to normalize the data but still keep as much detail as possible. We'll need to have four series, each with its own color. For any one data point, we'll only have a value of 100% in one of the series. All the others will be 0%. Meaning that for the timeframe that data point represents, whichever series has a value of 100% indicates the status for that moment. Let's look at the normalization rules:
  1. If the status code is 1, return 100, else return 0 (100 here means Trial status)
  2. If the status code is 2, return 100, else return 0 (100 here means Unsubscribed status)
  3. If the status code is 3, return 100, else return 0 (100 here means Subscribed status)
  4. If the status code is 4, return 100, else return 0 (100 here means Expired status)
This is what it would look like (graph on the right, first two shown for reference):
Notice how the problem in the morning is highlighted with a red and the problem in the afternoon is highlighted with a yellow? Easy to tell that there are two problems, that they are different, and that the morning problem is the more severe. 

This is what the final version would look like in the LogicMonitor web gui (not interesting I know since the status code didn't change the whole time I was building this):


Here's how it's built in the GUI:

Monday, June 3, 2019

How to share a ton of stuff with someone else


If you have the stuff to share:

  1. Download and install Resilio Sync
  2. After opening the app, click the plus sign in the top left corner and select "Standard Folder"
  3. Browse to the folder you want to share with someone else and click "Open"
  4. A new entry will appear in your list of folders. At the right end of this entry will appear three dots (when you mouseover the row). Click the three dots and select "Copy Read Only key" or "Copy Read & Write key" depending on whether or not you want the sharer to be able to change what's in the shared folder. They key is now in your clipboard.
  5. Send the key to the person you want to share with.

If you have received a key:


  1. Download and install Resilio Sync
  2. After opening the app, click the plus sign in the top left corner and select "Enter key or link"
  3. Paste in the key that was sent to you
  4. Browse to the folder you want to synchronize and select Open.
  5. Go grab a coffee and chips and wait until the synchronization finishes.


Disclaimer: using any protocol to transmit/receive data that you are not legally allowed to transmit/receive is obviously illegal. I'm not responsible if you use this to do something illegal.

Tuesday, April 16, 2019

One Trailer for Each Piece of the Saga

Saw an article bringing together all the trailers for all the Star Wars productions. They did the episodes 1-9 first, then the ancillary productions. I decided to put them into a YouTube playlist. You're welcome.

Don't forget the one that you can't find on YouTube, it's after Episode 6, before 7.

Friday, March 22, 2019

Web GL Globe

I have spent some time now working in the Oil Industry. I've been working in technology, so I haven't had much to do with actual oil. However, I did come across this really cool visualization of oil imports and exports when I was searching for a way to visualize some network performance data. It's based on a bit of code by Google called the WebGL Globe. WebGL Globe is built on a technology called WebGL, which is like OpenGL except that it runs natively in the browser. It allows for interaction like what you would expect from a 3D graphics game but right in the browser with web code. Some of the examples are pretty cool and load extremely quickly because they don't really use the normal document structure. (Some other examples of really neat uses of WebGL and/or three.js, a library to facilitate using WebGL: Galactic Neighbors Internet Safety game for Kids by Google Lubricious)

The concept is pretty expansive when you think about the kinds of data that can be shown. WebGL Globe combines data that has several dimensions of information encoded and visualized:

  1. Node Location - latitude and longitude
  2. Node connections - showing that two nodes are connected in some way (shown as an arc when you click on a country in the World of Oil visualization).
  3. Connection intensity - this one can have multiple dimensions depending on how creative you get. For example:
    1. the line color itself can indicate some sort of status (red/green/yellow/orange). It's even possible that you could show percentages of the arc length as certain colors to indicate distribution of the status (i.e. 10% is bad while 90% is good might mean a line that is mostly green with a segment of length 10% that is red).
    2. the thickness of the arc can be another dimension, indicating something like volume
    3. the maximum altitude of the arc can be another, indicating sample size
All this is neat, but what good does it do anyone. Well, if you've watched my Analyzing TCP Application Performance video, you know that monitoring of response time is the most important part of any infrastructure monitoring. If you're doing application response time monitoring, you should have data for each transaction showing how each transaction performed. That's a huge amount of data (big data anyone?). 

Visualizing that data requires a few steps of summarization and grouping. First of all, every transaction's metrics should be retained individually so that you can dive into the details if needed. Second, you can start grouping by user then by summarizing by time buckets (1 minute or 5 minute). Another level of grouping that can be done is to group by network location. This can legitimately be done because most users at a particular location are going to share a very high percentage of the network path that gets them to the services they are consuming. 

Let me rephrase: You should have data that describes sources, destinations, and the performance between them. Sound familiar? What I'm envisioning is taking this data and plotting it out using WebGL Globe. Each network location and each service hosting location is a node (hm, could cloud services actually be represented as clouds?!?). They'd be connected with arcs. Thickness of the arcs could represent the number of transactions between those two nodes. Height of the arc could represent the recent negative variability in performance or a way to highlight the selected arc. Color of the arc could show a measure of the deviance in performance. 

Click on a node and you'd see the arcs from all the user locations consuming services from that site (if any) and the arcs to all the service hosting locations consumed by that site (if any) pop up in altitude (normally they'd be displayed at sea level or hidden based on a GUI setting). This is similar to the action you see in the Global Oil visualization when you click on a country (you see the countries connected to it). 

From there, if you saw that all the arcs were showing problems, you would know (from problem domain isolation) that there is a problem common to all users at that site. You could click on the node again and get into performance stats for the infrastructure at that location (from a tool like LogicMonitor). Following the colors should get you to the root of the problem pretty quickly.

Alternatively, if you saw a problem in only one of the arcs (or a small selection) follow problem domain isolation tactics and click on the arc having the problem. That should dive you into the infrastructure connecting those two nodes so you can find the problem.

Nodes themselves can have problems that don't evidence themselves outside the node. If that's the case, you should be going into the node to look at why there are performance issues. You'd know to go to there because the node icon itself would be showing some color indicator that there's a problem.

Friday, March 8, 2019

Using Docker

Just a couple articles that have been camping out in my browser since I found them. They were very useful in helping me get docker doing useful stuff, like Ansible.

https://blog.docker.com/2016/09/build-your-first-docker-windows-server-container/

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04

alpine/git          latest    a1d22e4b51ad      10 days ago     27.5MB
ansible-docker      latest    25b39c3ffd15      2 weeks ago     153MB
ubuntu              latest    47b19964fb50      4 weeks ago     88.1MB

Ansible-docker is a container I modified to suit my needs. You can use it by either cloning the git repository and building from source or just pull it from the hub:
docker pull sweenig/ansible-docker

I used this to know how to push images to docker: https://ropenscilabs.github.io/r-docker-tutorial/04-Dockerhub.html

alpine/git is the easiest way I've found to run git on Windows (hint, it's not actually running in Windows but in Linux inside the container).

Thursday, March 7, 2019

Object Oriented Programming

Today's blog post may have been posted before, but it's a really good one. If you are looking to get into object oriented programming, you should give this a look: https://inventwithpython.com/blog/2014/12/02/why-is-object-oriented-programming-useful-with-a-role-playing-game-example/.