SLA Management
Overview
A
Service Level Agreement (SLA) is an understanding between two parties, usually a service provider and a customer, in which an expected
level of service is formally defined. It can be legally binding and refers to the contracted performance, or delivery time, of the service. How this is defined varies depending on the service being provided. For example, consider an online advertiser who provides ad content as the service provider and the owner of the website the ad is placed on as the customer. The customer requires the advertisement to be delivered within 2 seconds. This would be defined in the SLA, with some penalty, usually financial, if the expected response time exceeds 2 seconds from multiple locations. The specific terms, responsibilities, and requirements are worked out between the parties and are different for each situation.
AlertSite calculates response times measured from multiple monitoring locations
simultaneously, assuring SLA compliance and allowing comparison of actual performance with designated SLA objectives, operating periods, and compliance reporting exclusions.
AlertSite provides a simple way for customers to configure SLA objectives for site uptime, availability, and response time that match the SLA contract for devices that are using the SLA (MultiPOP) monitoring type. This guide assumes the reader is familiar with how to use and navigate through the AlertSite console.
Terminology
- Uptime: your website is "up" when any monitoring location can successfully access your site (returns a non-error code) during the monitoring interval
- Availability: the percentage of successful measurements out of the total measurements of your site from all your monitoring locations during the report time frame
- Response Time: the time it takes the monitoring location to access your website and return from the GET request
- Error Correlation Technology (ECT): A proprietary AlertSite feature that recognizes errors at all monitoring locations simultaneously and correlates the results for accurate reporting
The values are an
average over the selected time frame. For example, say you were monitoring from 3 locations, checking a time frame that included 100 measurements. One of your locations was unable to access your site 5 times during that period (95% availability from that location). The other 2 locations always had access (100% availability from each of those 2 locations). Your
uptime would be 100%, while the
availability would be 98.33%:
(100+100+95)/3 = 295/3 = 98.33%
Configuration
First, create a site device in your AlertSite Console by navigating to the
Configuration →
Sites screen, clicking the
Add a new site button, and filling in the on-line form. The
Site Type must be either an
SLA Performance plan or
Usage Based Monitoring plan, and the
Monitoring Type must be
SLA (MultiPOP).
You can select as many locations as you like from the Locations list from the device's Locations table, accessed by clicking on the
Locations button in the upper right of the configuration screen, but a
minimum of 2 monitoring locations is required. You can also elect to rotate among your selected monitoring locations.
Back to top
SLA Objectives
In order to prove that your site is operating within the SLA requirements, you need to set up
SLA Objectives. AlertSite uses its proprietary Error Correlation Technology (ECT) to report when all monitoring locations detect an error simultaneously, rendering the site unavailable. ECT will also determine if the site is up. Uptime statistics are especially useful in management of SLAs since they can accurately reveal if the web service was at all available. Setting objectives enables you to show that your site was in compliance during any selected time frame within your data retention period.
The
Configuration: SLA Objectives page allows you to set service-level objectives for uptime, availability, and overall response time. In addition, operational periods can be defined for specific time periods after the fact for SLA compliance reporting, for example, during downtime for scheduled periodic maintenance. One-time exclusions for single-event downtimes can also be defined.
The
Operating Periods section in the
SLA Manager screen defines both inclusion and exclusion periods. Only time intervals listed in this section are included in the SLA Report, while time intervals not defined in this section are excluded from the SLA Report. Setting
Operating Periods and
One-Time Exclusion Periods does not halt monitoring, which is accomplished with Blackouts.
The SLA Objectives screen is only available for devices that are configured with the
SLA (MultiPOP) monitoring type. This illustration shows two SLA devices with configured SLA Objectives, and one without:
The figure below displays how device
SLA Home Page is configured, with a minimum of 99% Availability, 98% Uptime, and a Response Time expected to be less than or equal to .20 seconds over an operating period of Monday through Friday. There are no exclusions, either weekly or one-time:
With this configuration, an SLA Report, available only for SLA devices that have SLA Objectives configured, will display a table showing whether or not your site was in compliance with the objectives and the the number of errors and checks in the selected reporting time frame:
As the values in the
Errors / Checks column above show, out of 27974 checks done during the selected time frame, there were 9
Response Time errors, i.e., the response time was higher than the goal average 9 times. However, since the
actual average response time is still lower than the objective, the site is in SLA compliance. Note that the number of
Uptime checks shown is lower than the number of
Availability and
Response Time checks. This is because as long as one location can access the site successfully, that's all that's included in the statistics.
Back to top
Notification
You can control the sensitivity of alerts you receive when your site is out of compliance, above and beyond normal notification of timeouts, keyword errors, or tcp connection errors. You can elect to be notified if 2, 3, or all of your monitoring locations detect an out-of-compliance condition from the
Preferences section of the
Account →
Manage Accounts screen:
If you want to be sure that you are notified when, say, 2 of your monitoring locations have detected a response time higher than the configured goal during the monitoring interval, you would select
Send notification when TWO locations detect an error from the dropdown. If an out-of-bounds response time is detected from only 1 location during the interval, you would not be notified. The condition would be reported as an error in the SLA Report, but you would not receive an alert.
Exclusions
SLA Exclusions exclude measurements gathered in an SLA device during a specified time interval from your SLA Report. SLA Exclusions should not be confused with Blackouts. Blackouts actually disable monitoring (or notifications) for a device during a specified time interval.
With SLA Exclusion, monitoring takes place but those measurements are not included in SLA calculations. A Blackout does not have to be applied to that SLA device for that time period in order to exclude those measurements from the SLA Report. However, if you do have Blackouts set for an SLA device, it reduces the number of checks displayed in the
Errors/Checks column under
Service Level Objectives in the
SLA Manager screen.
If you need to remove monitoring checks from the SLA Report, you can add a
One-Time Exclusion Period. For example, say you forgot to configure a Blackout for a planned UPS replacement and monitoring was done during the 10 minutes the system was down. You can configure a
One-Time Exclusion Period to remove that time period from the SLA Report, and document the event right in the report.
Applying SLA Objectives to Other Devices
If you have other SLA devices that need to have the same SLA Objectives applied, once you have set up one device, you can go to
Configuration →
Bulk Settings and use the one device as a template for any other devices. Please go to the
Bulk Settings help page and click on
SLA Settings in the bullet list at the top.
Back to top