SLA Management

Overview

A Service Level Agreement (SLA) is an understanding between two parties, usually a service provider and a customer, in which an expected level of service is formally defined. It can be legally binding and refers to the contracted performance, or delivery time, of the service. How this is defined varies depending on the service being provided. For example, consider an online advertiser who provides ad content as the service provider and the owner of the website the ad is placed on as the customer. The customer requires the advertisement to be delivered within 2 seconds. This would be defined in the SLA, with some penalty, usually financial, if the expected response time exceeds 2 seconds from multiple locations. The specific terms, responsibilities, and requirements are worked out between the parties and are different for each situation.

AlertSite calculates response times measured from multiple monitoring locations simultaneously, assuring SLA compliance and allowing comparison of actual performance with designated SLA objectives, operating periods, and compliance reporting exclusions.

AlertSite provides a simple way for customers to configure SLA objectives for site uptime, availability, and response time that match the SLA contract for devices that are using the SLA (MultiPOP) monitoring type. This guide assumes the reader is familiar with how to use and navigate through the AlertSite console.

Terminology

  • Uptime: your website is "up" when any monitoring location can successfully access your site (returns a non-error code) during the monitoring interval
  • Availability: the percentage of successful measurements out of the total measurements of your site from all your monitoring locations during the report time frame
  • Response Time: the time it takes the monitoring location to access your website and return from the GET request
  • Error Correlation Technology (ECT): A proprietary AlertSite feature that recognizes errors at all monitoring locations simultaneously and correlates the results for accurate reporting

The values are an average over the selected time frame. For example, say you were monitoring from 3 locations, checking a time frame that included 100 measurements. One of your locations was unable to access your site 5 times during that period (95% availability from that location). The other 2 locations always had access (100% availability from each of those 2 locations). Your uptime would be 100%, while the availability would be 98.33%:

(100+100+95)/3 = 295/3 = 98.33%

Configuration

First, create a site device in your AlertSite Console by navigating to the ConfigurationSites screen, clicking the Add a new site button, and filling in the on-line form. The Site Type must be either an SLA Performance plan or Usage Based Monitoring plan, and the Monitoring Type must be SLA (MultiPOP).

New_SLA_Dropdown.PNG


You can select as many locations as you like from the Locations list from the device's Locations table, accessed by clicking on the Locations button in the upper right of the configuration screen, but a minimum of 2 monitoring locations is required. You can also elect to rotate among your selected monitoring locations.

Back to top

SLA Objectives

In order to prove that your site is operating within the SLA requirements, you need to set up SLA Objectives. AlertSite uses its proprietary Error Correlation Technology (ECT) to report when all monitoring locations detect an error simultaneously, rendering the site unavailable. ECT will also determine if the site is up. Uptime statistics are especially useful in management of SLAs since they can accurately reveal if the web service was at all available. Setting objectives enables you to show that your site was in compliance during any selected time frame within your data retention period.

The Configuration: SLA Objectives page allows you to set service-level objectives for uptime, availability, and overall response time. In addition, operational periods can be defined for specific time periods after the fact for SLA compliance reporting, for example, during downtime for scheduled periodic maintenance. One-time exclusions for single-event downtimes can also be defined.

The Operating Periods section in the SLA Manager screen defines both inclusion and exclusion periods. Only time intervals listed in this section are included in the SLA Report, while time intervals not defined in this section are excluded from the SLA Report. Setting Operating Periods and One-Time Exclusion Periods does not halt monitoring, which is accomplished with Blackouts.

The SLA Objectives screen is only available for devices that are configured with the SLA (MultiPOP) monitoring type. This illustration shows two SLA devices with configured SLA Objectives, and one without:

SLA_Objectives.PNG


The figure below displays how device SLA Home Page is configured, with a minimum of 99% Availability, 98% Uptime, and a Response Time expected to be less than or equal to .20 seconds over an operating period of Monday through Friday. There are no exclusions, either weekly or one-time:

SLA_Manager.PNG


With this configuration, an SLA Report, available only for SLA devices that have SLA Objectives configured, will display a table showing whether or not your site was in compliance with the objectives and the the number of errors and checks in the selected reporting time frame:

SLA_Detail_Rpt.PNG


As the values in the Errors / Checks column above show, out of 27974 checks done during the selected time frame, there were 9 Response Time errors, i.e., the response time was higher than the goal average 9 times. However, since the actual average response time is still lower than the objective, the site is in SLA compliance. Note that the number of Uptime checks shown is lower than the number of Availability and Response Time checks. This is because as long as one location can access the site successfully, that's all that's included in the statistics.

Back to top

Notification

You can control the sensitivity of alerts you receive when your site is out of compliance, above and beyond normal notification of timeouts, keyword errors, or tcp connection errors. You can elect to be notified if 2, 3, or all of your monitoring locations detect an out-of-compliance condition from the Preferences section of the AccountManage Accounts screen:

Manage_Acct_Prefs.png

If you want to be sure that you are notified when, say, 2 of your monitoring locations have detected a response time higher than the configured goal during the monitoring interval, you would select Send notification when TWO locations detect an error from the dropdown. If an out-of-bounds response time is detected from only 1 location during the interval, you would not be notified. The condition would be reported as an error in the SLA Report, but you would not receive an alert.

Exclusions

SLA Exclusions exclude measurements gathered in an SLA device during a specified time interval from your SLA Report. SLA Exclusions should not be confused with Blackouts. Blackouts actually disable monitoring (or notifications) for a device during a specified time interval.

With SLA Exclusion, monitoring takes place but those measurements are not included in SLA calculations. A Blackout does not have to be applied to that SLA device for that time period in order to exclude those measurements from the SLA Report. However, if you do have Blackouts set for an SLA device, it reduces the number of checks displayed in the Errors/Checks column under Service Level Objectives in the SLA Manager screen.

If you need to remove monitoring checks from the SLA Report, you can add a One-Time Exclusion Period. For example, say you forgot to configure a Blackout for a planned UPS replacement and monitoring was done during the 10 minutes the system was down. You can configure a One-Time Exclusion Period to remove that time period from the SLA Report, and document the event right in the report.

Applying SLA Objectives to Other Devices

If you have other SLA devices that need to have the same SLA Objectives applied, once you have set up one device, you can go to ConfigurationBulk Settings and use the one device as a template for any other devices. Please go to the Bulk Settings help page and click on SLA Settings in the bullet list at the top.

Back to top

© 2012 by AlertSite a SmartBear Business --
Syndicate this site RSSATOM