In calculating the pass rate, we defined failures to include the following:
We ignored call-specific application errors such as issues with the returned content and client-side HTTP status code 4xx warnings caused by authentication problems such as expired tokens.
If an API fails, it may pass if called again immediately and the outage is transitory. However, our methodology still gives a general indication of availability issues.
The traditional telecommunications standard for service availability is five 9s – at least 99.999% uptime or just five minutes of downtime in a year. Of the 32 services we analyzed, only DocuSign beat the five 9s standard with perfect performance over the entire 24 months.
Table 1: Number of APIs impactedAvailability | Number of Services | Range in minutes/24 months |
---|---|---|
100% | 1 | 0 minutes of outage |
99.999% | five 9s or better | 0 | Less than ~10 minutes of outage |
99.99% | four 9s or better | 6 | ~10 to ~105 minutes of outage |
99.9% | three 9s or better | 20 | ~105 to ~1053 minutes of outage |
Less than 99.9% | 5 | ~1053 to ~10,526 minutes of outage |
Five major corporate services out of 32 studied over the whole 24-month period scored less than three 9s. There was nearly three hours difference in unscheduled downtime observed between two leading file management services.
APImetrics uses CASC, our patented quality scoring system, to compare the quality of different APIs. CASC (Cloud API Service Consistency) blends multiple factors to derive a "credit rating" for an API, benchmarked against our unmatched historical dataset of API test call records.
Through analysis of CASC scores, we have established a quality baseline:
Score | Number of Services: 2018 | Number of Services: 2019 | Number of Services: 2020 |
---|---|---|---|
9.00+ | 11 | 21 | 28 |
8.00-8.99 | 28 | 25 | 18 |
7.00-7.99 | 12 | 3 | 4 |
6.00-6.99 | 0 | 1 | 1 |
5.00-5.99 | 0 | 0 | 0 |
4.00-4.99 | 0 | 0 | 0 |
3.00-3.99 | 1 | 2 | 0 |
There is a marked tendency for services in the Yellow Zone (CASC score of 7.00 to 7.99) to have moved into the highest two bands in the Green Zone in 2020. This is evidence of both improved network infrastructure and a general improvement in the performance of API backends as best practices and more reliable software percolate through the API ecosystem.
Some calls will be faster than others because of the nature of the backend processing involved, so total call duration, even over a sample size of tens of millions of calls, can only give a partial view of the behavior of the APIs.
But we can look at other factors, such as various components of the call, to see if they give us interesting data about the health of the cloud and cloud APIs in general. We are fortunate to have directly comparable datasets of APIs from 2017–2020 for analysis.
Over the past two years, DNS lookup times have improved dramatically.
The median DNS lookup time across all clouds and regions is now 12 ms or 4 ms (AWS and Europe). However, some variations exist between individual locations (some with median DNS lookup times of 4 ms, others slower than 12 ms), and some individual APIs have much slower DNS lookup times. It is crucial to optimize DNS performance on a location/API basis to ensure the best API quality and user experience.
We have observed that just having a CDN provider such as Cloudflare or Akami in place does not automatically improve DNS performance for API calls made from regional data centers. All network engineering teams should monitor this on an ongoing basis and adjust performance criteria.