tip This documentation is for the old interface of AlertSite. If you use the new interface (AlertSite UXM), please see SLA Monitoring.

SLA Management

A Service Level Agreement (SLA) is an understanding between two parties, usually a service provider and a customer, in which an expected level of service is formally defined. It can be legally binding and refers to the contracted performance, or delivery time, of the service. How this is defined varies depending on the service being provided. For example, consider an online advertiser who provides ad content as the service provider and the owner of the website the ad is placed on as the customer. The customer requires the advertisement to be delivered within 2 seconds. This would be defined in the SLA, with some penalty, usually financial, if the expected response time exceeds 2 seconds from multiple locations. The specific terms, responsibilities, and requirements are worked out between the parties and are different for each situation.

AlertSite calculates response times measured from multiple monitoring locations simultaneously, assuring SLA compliance and allowing comparison of actual performance with designated SLA objectives, operating periods, and compliance reporting exclusions.

AlertSite provides a simple way for customers to configure SLA objectives for site uptime, availability, and response time that match the SLA contract for devices that are using the SLA (MultiPOP) monitoring type. This guide assumes the reader is familiar with how to use and navigate through the AlertSite console.

Terminology

  • Uptime: your website is "up" when any monitoring location can successfully access your site (returns a non-error code) during the monitoring interval
  • Availability: the percentage of successful measurements out of the total measurements of your site from all your monitoring locations during the report time frame
  • Response Time: the time it takes the monitoring location to access your website and return from the GET request
  • Error Correlation Technology (ECT): A proprietary AlertSite feature that recognizes errors at all monitoring locations simultaneously and correlates the results for accurate reporting

The values are an average over the selected time frame. For example, say you were monitoring from 3 locations, checking a time frame that included 100 measurements. One of your locations was unable to access your site 5 times during that period (95% availability from that location). The other 2 locations always had access (100% availability from each of those 2 locations). Your uptime would be 100%, while the availability would be 98.33%:

(100+100+95)/(100+100+100) = 295/300 = 98.33%

Configuration

First, create a site device in your AlertSite Console by navigating to the ConfigurationSites screen, clicking the Add a new site button, and filling in the on-line form. The Site Type must be either an SLA Performance plan or Usage Based Monitoring plan, and the Monitoring Type must be SLA (MultiPOP).

New_SLA_Dropdown.PNG


You can select as many locations as you like from the Locations list from the device's Locations table, accessed by clicking on the Locations button in the upper right of the configuration screen, but a minimum of 2 monitoring locations is required. You can also choose to rotate among your selected monitoring locations.

Back to top

SLA Objectives

To prove that your site is operating within the SLA requirements, you need to set up SLA Objectives. AlertSite uses its proprietary Error Correlation Technology (ECT) to report when all monitoring locations detect an error simultaneously, indicating that the site is unavailable. ECT will also determine if the site is up, in that at least one location performed a successful test during the monitoring interval. Uptime statistics are especially useful in management of SLAs since they can accurately reveal if the web service was at all available. Setting objectives enables you to show that your site was in compliance during any selected time frame within your data retention period.

The Configuration: SLA Objectives page allows you to set service-level objectives for uptime, availability, and overall response time. The optional secondary response time threshold defines the “Frustrated” Apdex level. In addition, operational periods can be defined for specific time periods, for example, during downtime for scheduled periodic maintenance, for SLA compliance reporting. One-time exclusions for single-event downtimes can also be defined.

The Operating Periods section in the SLA Manager screen defines both inclusion and exclusion periods. Only time periods listed in this section are included in SLA Reports; time intervals not defined here are excluded from SLA Reports. Setting Operating Periods and One-Time Exclusion Periods does not halt monitoring, which is accomplished with blackouts.

The SLA Objectives screen displays monitored devices that are configured with SLA Objectives. For example, say an account has 14 devices, 5 of which have SLA Objectives configured, the Configuration: SLA Objectives would list them:

With_and_Without_SLA_MP.PNG


To configure objectives for other devices, click the Add a new SLA button. The screen will present the list of account devices that have not been configured with SLA objectives, for example, the remaining 9:

Add_New_SLA.PNG


The figure below displays the SLA Objectives for transaction monitor SLA Transaction, with a minimum of 99.8% Availability, 99.9% Uptime, and a Response Time expected to be less than or equal to 5 seconds over an operating period of Monday through Friday. There are no exclusions, either weekly or one-time:

SLA_Txn_SLO.PNG


SLA Reports

The SLA Detail Report is available only for devices that have SLA Objectives configured and displays a table showing whether or not your site was in compliance with the set objectives and the the number of errors and checks in the selected reporting time frame.

In the report below, the monitor SLA Transaction was configured with 99.80% Availability, 99.90% Uptime, and 14.00 seconds expected Response Time:

SLA_Detail_Rpt.PNG


As the values in the Errors / Checks column above show, out of 1274 checks done during the selected time frame, there were 165 Response Time errors, meaning the response time was higher than the goal average 165 times. However, since the actual average response time, 12.9462 secs, is still lower than the objective, 14.00, the site is in SLA compliance. Note that the number of Uptime checks shown is lower than the number of Availability and Response Time checks. This is because as long as one location can access the site successfully, that's all that's included in the statistics.

For Availability and Uptime, clicking the value in the Errors/Checks column displays a section for annotating Availability errors. The time stamp, error status code, and monitoring location where the error occurred are provided, along with a free-form field for customer notes.

Back to top

Notification

You can control the sensitivity of alerts you receive when your site is out of compliance, above and beyond normal notification of timeouts, keyword errors, or TCP connection errors. To do this, use the SLA (MultiPOP) notification option on the AccountManage Accounts screen. You can choose to be notified if 1, 2, 3, or all of your monitoring locations detect an out-of-compliance condition:

SLA (MultiPOP) notification


If you want to be notified when, say, at least 2 of your monitoring locations have detected an availability error during the monitoring interval, select Send notification when TWO locations detect an error from the dropdown. If an error is detected from only 1 location during the interval, you would not be notified. The condition would be reported as an error in the SLA Report, but you would not receive an alert.

Exclusions

SLA Exclusions exclude measurements gathered in an SLA device during a specified time interval from your SLA Report. SLA Exclusions should not be confused with Blackouts. Blackouts actually disable monitoring (or notifications) for a device during a specified time interval.

With SLA Exclusion, monitoring takes place but those measurements are not included in SLA calculations. A Blackout does not have to be applied to that SLA device for that time period in order to exclude those measurements from the SLA Report. However, if you do have Blackouts set for an SLA device, it reduces the number of checks displayed in the Errors/Checks column under Service Level Objectives in the SLA Manager screen.

If you need to remove monitoring checks from the SLA Report, you can add a One-Time Exclusion Period. For example, say you forgot to configure a Blackout for a planned UPS replacement and monitoring was done during the 10 minutes the system was down. You can configure a One-Time Exclusion Period to remove that time period from the SLA Report, and document the event right in the report.

Applying SLA Objectives to Other Devices

If you have other SLA devices that need to have the same SLA Objectives applied, once you have set up one device, you can go to ConfigurationBulk Settings and use the one device as a template for any other devices. Please go to the Bulk Settings help page and click on SLA Settings in the bullet list at the top.

Back to top
Back to index

© 2016 SmartBear Software --
Syndicate this site RSSATOM