API Cloud Performance Analysis Report

A Report from APImetrics & API.expert

Dr Paul M Cray

Key Metrics

Pass Rates

In calculating the pass rate, we defined failures to include the following:

  • 5xx server-side errors
  • Network errors in which no response is returned
  • Content errors where the API did not return the correct content, i.e., an empty JSON body or incorrect data returned
  • Slow errors in which a response is received after an exceptionally long period
  • Redirect errors in which a 3xx redirect HTTP status code is returned

We ignored call-specific application errors such as issues with the returned content and client-side HTTP status code 4xx warnings caused by authentication problems such as expired tokens.

If an API fails, it may pass if called again immediately and the outage is transitory. However, our methodology still gives a general indication of availability issues.

Five 9s

The traditional telecommunications standard for service availability is five 9s – at least 99.999% uptime or just five minutes of downtime in a year. Of the 32 services we analyzed, only DocuSign beat the five 9s standard with perfect performance over the entire 24 months.

Table 1: Number of APIs impacted
Availability Number of Services Range in minutes/24 months
100% 1 0 minutes of outage
99.999% | five 9s or better 0 Less than ~10 minutes of outage
99.99% | four 9s or better 6 ~10 to ~105 minutes of outage
99.9% | three 9s or better 20 ~105 to ~1053 minutes of outage
Less than 99.9% 5 ~1053 to ~10,526 minutes of outage

Five major corporate services out of 32 studied over the whole 24-month period scored less than three 9s. There was nearly three hours difference in unscheduled downtime observed between two leading file management services.

Quality

APImetrics uses CASC, our patented quality scoring system, to compare the quality of different APIs. CASC (Cloud API Service Consistency) blends multiple factors to derive a "credit rating" for an API, benchmarked against our unmatched historical dataset of API test call records.

Through analysis of CASC scores, we have established a quality baseline:

  • Scores over 8.00 relate to a healthy, well-functioning API that will give few problems to users.
  • Scores between 6.00–8.00 indicate some significant issues that will lead to a degraded user experience and increased engineering support costs.
  • A CASC score below 6.00 is considered poor and urgent attention is required.
Score Number of Services: 2018 Number of Services: 2019 Number of Services: 2020
9.00+112128
8.00-8.99282518
7.00-7.991234
6.00-6.99011
5.00-5.99000
4.00-4.99000
3.00-3.99120

There is a marked tendency for services in the Yellow Zone (CASC score of 7.00 to 7.99) to have moved into the highest two bands in the Green Zone in 2020. This is evidence of both improved network infrastructure and a general improvement in the performance of API backends as best practices and more reliable software percolate through the API ecosystem.

Latency

Some calls will be faster than others because of the nature of the backend processing involved, so total call duration, even over a sample size of tens of millions of calls, can only give a partial view of the behavior of the APIs.

But we can look at other factors, such as various components of the call, to see if they give us interesting data about the health of the cloud and cloud APIs in general. We are fortunate to have directly comparable datasets of APIs from 2017–2020 for analysis.

  • For every cloud and region, the average total time in 2019 and 2020 was significantly lower than in 2018.
  • Within the rounding error, the median DNS time for all clouds and regions in 2019 was the same at 12 ms; in 2020, it was also 12 ms for all clouds except AWS and all regions except Europe, both of which had a DNS time of 4 ms.
  • AWS was the fastest cloud by median total time in 2019 and 2020; in 2018, it was the third slowest.

Over the past two years, DNS lookup times have improved dramatically.

The median DNS lookup time across all clouds and regions is now 12 ms or 4 ms (AWS and Europe). However, some variations exist between individual locations (some with median DNS lookup times of 4 ms, others slower than 12 ms), and some individual APIs have much slower DNS lookup times. It is crucial to optimize DNS performance on a location/API basis to ensure the best API quality and user experience.

We have observed that just having a CDN provider such as Cloudflare or Akami in place does not automatically improve DNS performance for API calls made from regional data centers. All network engineering teams should monitor this on an ongoing basis and adjust performance criteria.