Thursday, March 29, 2012

Understanding SA Discovery and Pruning/Grooming

First of all, a little conceptual history around SuperAgent:
SuperAgent was meant to automate the task of analyzing packet captures for essential metrics indicating server or network latency.  An engineer wanted a better way to do it than manually and SA was born.  Since its inception, it has grown by leaps and bounds increasing its capabilities.  Despite the growth, one major concept has remained: SA is meant to automate a manual process for your top applications.  This is not a scalability issue.  It's something fundamental to the through process behind every revision of the product.  SA is meant to analyze the transactions of applications of interest to determine where latency lies.

With the most recent version, SA added a feature that automatically discovers and configures applications.  This opened up a whole new area of SA since admins didn't have to automatically configure the applications they were interested in.  All they had to do was identify the servers that might be involved and SA did the rest.  Expectations began to rise since admins could now easily increase the bounds of what was considered an 'application of interest'.

In order to prevent performance problems that might arise in very complex environments, the developers imposed a limit on the discovery process.  When the discovery process has discovered and configured 1000 servers or 1000 applications (whichever comes first) a pruning process will begin.  This algorithm reevaluates the active combinations every 5 minutes to determine which 1000 servers and which 1000 applications will remain in the configuration.  This doesn't affect any applications configured by the administrator and shouldn't affect the largest, most active applications.  Administrators have to understand that this is by design and that the applications configured in SA don't necessarily represent all the applications hosted by a server.

Luckily, the server and application limits can be raised with a simple query in the database.  To view the current limits execute the following query:
select * from parameter_descriptions where parameter like 'maxNumAuto%';
Updating those values will change the limits.  Remember, those limits were put into place to prevent performance problems.  Also SA hasn't been tested by CA's QA department with any limit other than 1000, so if you run into any problems after changing those limits, you'll get push back from support because of it.  This is one of the things included in the CIG, which is basically required for every case, so support will know that you did it.  

I have increased the limits by 500 in some cases, just to push the envelope a little.  I didn't experience severe, immediate problems.  If you need much more than that, consider more infrastructure (read more SA master consoles).