Quantcast
Channel: THWACK: Popular Discussions - Alert Lab
Viewing all articles
Browse latest Browse all 8833

Alert Trigger Suppression vs Alert Suppression

$
0
0

Alert Suppression is a topic that comes and goes in the forums and I thought I would contribute my version of Site based Alert Suppression.  My method is a little different in the sense that I prefer to prevent the alert from triggering in the first place.

 

WARNING: This is a 'hack', so I wouldn't recommend this for the uninitiated. Information is given "AS IS". Please read completely before attempting.

 

Purpose: 

This is a very basic site based Alert Trigger Suppression that prevents alerts from occurring when the Remote Site WAN link is down. Simply put, I don't want to receive Node Up/Down Alerts from a Site's devices that sit behind a router or firewall when the Site's WAN Link goes down.

 

This assumes that each remote site has a single point of failure. e.g. / 1 router, 1 firewall,  1 HSRP address, etc  (This can be modified to handle locations with redundant links.)

 

Prerequisites and Ground Work:

 - In this example, you will need to add the following  Custom Node Properties:

Custom Property Name - Custom Property Type

Required:           Site      - Text

Required:          WAN_Link      - Yes / No

---- Do the following if you have Alerts Going to many different Customers / Groups ----

Optional:          Alert_Email     - Text

Optional:          MailTo1 - Text

Optional:          MailTo2 - Text

 

The 'Site' custom property needs to be populated on all devices.

The 'WAN_Link' custom property needs to be "checked off" (set to 'True') on each router, firewall, HSRP Node that handles the WAN link for each Site. In this basic scenario, make sure you only have 1 device designated per SITE.

(- Just a quick note about the HSRP Nodes. Most recommend monitoring HSRP IP address as an ICMP only node. I prefer to monitor this as an SNMP node without the interfaces. This allows for an easy way to verify which router / switch is the primary.)

 

Optional: 

The 'Alert_Email' custom property needs to be populated with the following string without the quotes: "${Node.MailTo1};${Node.MailTo2}"   - This can be done quickly with a simple SQL Update command. You can also set this as a default value so that any new nodes added to the system will automatically populate with this string.

 

Optional: 

The 'MailTo1' & 'MailTo2' custom properties can be populated with email addresses - 'MailTo1' should always be populated. As a best practice, the email addresses should always be a distribution list. As a side benefit, you can expose the custom property 'MailTo2' from the Orion Website for modification.

 

Creating the Alerts:

You will need a minimum of 2 alerts. One basic alert to let you know when your Remote Site WAN Link is down and another alert which handles everything else. The second alert is where the 'hack' comes in to play.

 

Alert 1 -  Create an Advanced Alert for WAN Node Up/Down

          - Trigger when Node Status = down, WAN_Link =Yes

          - Reset when Node Status = Up or Node Status = Unmanaged

          - Suppress 

       - When Your SW Server's Gateway IP Address is not UP

       - When Your SW Server's WAN connection is not UP

          - Configure TOD and Actions to your needs.

 

Alert 2 - Create and Advanced Alert - Node Up/Down - 

          - Name it  - Global Node Up/Down Alert

          - Trigger when Node Status = down

          - Reset when Node Status = Up or Node Status = Unmanaged

          - Suppress 

     - When Your SW Server's Gateway IP Address is not UP

     - When Your SW Server's WAN connection is not UP

          - Configure TOD and Actions to your needs.

 

- Just as a side note,  - by monitoring the SW Server's Gateway as an ICMP only Node and adding it to the Alert Suppression section in the manner outlined above, you can now quickly disable all alerts globally by "UNMANAGING" the ICMP only node  through the web console.  This assumes your SW server is not multi-homed / multi-pathed.

 

The 'hack': - Modify the SQL Query  in Alert Definitions

     - Open Solarwinds Database Manager

     - Open Your NetPerfMon Database

     - Open Table ' AlertDefinitions'

          - Click on the Query Button

          - In the Query Box

               Select *

               From AlertDefinitions

               Where AlertName = 'Global Node Up/Down Alert'

          - Click the Refresh Button

 

Just as precaution, make sure you only have 1 result displayed. We are going to copy the following SQL Query into the 'TriggerQuery'  Field. Improper pasting can affect multiple records.

 

 

Select A.NodeId AS NetobjectID, A.Caption AS Name

From Nodes A, Nodes U

Where

(

     (A.Status = '2') AND

(

     (A.Site = U.Site) AND

     (U.WAN_Link = '1') AND

     (U.Status = '1')

)

)

 

 

          - Copy the SQL Query above.

          - Check Read-Write check box near the Refresh button

          - Goto Field ' TriggerQuery' (expand row size for easier viewing)

          - Remove the current query and paste the new query

          - Move cursor to the last line of the new SQL Query and hit enter

 

That's it.

 

How the Query Works:

 

Nodes A, Nodes U           <-- the A & U become Aliases of the Nodes Table. This allows for a self-join.

A.Status = '2'               <-- Node is down

     AND

     A.Site = U.Site       <-- matches all nodes with the same site code

     U.WAN_Link = '1'     <-- finds the WAN node for the Site

     U.Status = '1'          <-- WAN node is UP

 

This alert will report a node down when the WAN Link is up. If the WAN Link is not up, no alert. Unmanaging the WAN Link Node for a Site will also suppress all Up / Down alerts from all devices from the same site.

 

Important Note:  At this point, the 'Global Node Up/Down Alert' CANNOT be changed or modified from Alert Manager. Any time a change is made to this alert, you will have to paste the custom SQL query back into the TriggerQuery field.

 

 

As you can see, the query itself is very simple and can be easily modified.  One of the first modifications you'll want to make is to exclude local devices. As an example, if your Solarwinds Server is located at site 'CORP', your new query would like this:

 

Select A.NodeId AS NetobjectID, A.Caption AS Name

From Nodes A, Nodes U

Where

(

     (A.Status = '2') AND

     (A.Site <>'CORP') AND

(

     (A.Site = U.Site) AND

     (U.WAN_Link = '1') AND

     (U.Status = '1')

)

)

 

With this new query in place, you will need to add a new Up/Down Alert for Nodes belonging to Site 'CORP'. You can expand the query to exclude Sites that do not have a single point of failure, etc.  The rest is up to you.

 

Custom Property ${Alert_Email} explained: 

There are a couple of things I dislike about the way Advanced Alerts work. 1. No Global setting for the Reply to address. Secondly and the most annoying, no programmatic logic to where the email alerts go. 

The first problem is easily overcome by always copying an existing alert or importing a template. The second part is overcome by removing the Email Destination from the Alert. That's where the ${Alert_Email} custom property is used.

 

When creating the Email Alert Action,  always populate the "To:" with ${Node.Alert_Email}

(The period left off to prevent confusion.)

 

The email alerts will now go to the email addresses populated in the Custom Property fields 'MailTo1' and 'MailTo2'.

Now 1 alert can send emails to hundreds of different email addresses.  You'll have to do a little leg work populating the 'MailTo1' and 'MailTo2' fields, but once it is in place, you won't have to create a different alert for every individual group. My philosophy is to keep the number of queries the system has to make as low as possible.

 

The ${Node.Alert_Email} works with Interface, UNDP and APM alerts, as long as the APM's associated Node has the right email address.  Configuring APM's and APM alerts is a discussion for another time.

 

As I said in the beginning, this is a basic site based Alert Trigger Suppression. What you do with this information is up to you. It is being shared "AS IS".

 

- v


Viewing all articles
Browse latest Browse all 8833

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>