Let me tell you a story of mishaps.
The Requirement
We were tasked with displaying some information from a certain well known search engine (amongst other things) provider on a website. This was specifically reviews of places. There are a number of places that are unlikely to be added to or removed from – these would be known well in advance. The reviews were to be merged into one list – newest first.
That’s not too difficult. There are APIs that are supplied so that we can access this information. There are quite a few examples of how to do it as well. It comes down to a simple request to download some data and manipulate it in such a way that suits our code. Repeat for each place. Merge the list and display.
As we try to be good developers, the code to do this was placed inside a re-usable control. This then lets us use this code wherever it is required on the website.
Problem 1
It turns out that even if the control has its visibility set to false it is still rendered.
Problem 2
We cannot ask for the reviews for all the places in one go. We must ask for each place individually.
There is a cost per API request. This is about 2 pence per API request. There are about 15 places we need to get the information for so we will have to make 15 requests. That will cost us about 30 pence each time we want to render this list of reviews.
We also know that this information that we are downloading does not change very often – we are very unlikely to get more than a couple of additional reviews in a week.
This allows us to cache the results in a number of files. If we check when the file was created, we can just use its content if it was created less than a week ago – otherwise we can go and get the data and write it into the file again. We would know it is up to date. Or at least as up to date as it really needs to be.
Problem 3
If you just overwrite a file, the created date does not change.
What actually happens?
Each time a page is requested we are rendering the control to display the reviews.
Each time the control is rendered we want the reviews for about 15 places.
The file where we are caching the data always appears to be old (as the created date hasn’t changed) so we make the call to the API.
Total cost to get the results for all the places? 30 pence every time a page is requested. Whether the review list was required, not required, or even if someone requested a page that doesn’t exist.
That doesn’t sound like a lot but I don’t think you would want an additional charge every time any page on your website was viewed.
If you start a crawl of your site with thousands of pages the numbers add up quickly.
Imagine we are monitoring your website every minute to make sure it is working. We would do that by loading the home page and making sure a piece of text is present. Every time we do that it costs you 30 pence. Total cost? 30 pence x 60 x 24 = £432 per day. That’s just for one monitor. Everyone knows that just having one monitor is not ideal. (A little side-hint: if you are running your website in a cloud provider, monitor it from a different cloud provider)
The solutions?
Problem 1 - only go and get the data if we want to display it
Problem 2 – ask the API provider to rewrite their API so that it will accept a list of place ids
Problem 3 - when looking at the date of the file, use the last time it was written to
We have implemented 1 and 3. Problem 2 is still a problem.
However, there’s another slightly subtle problem.
Problem 4
The API provider specifically say that we must not cache the results of the API. Seems surprising really.
We can, however, cache the output though for performance reasons. And so that’s what we’ll do.
How will we stop API usage costing a fortune?
Imagine that you did not use any clever caching techniques to make your website quicker. What can you do to stop your API usage costs escalating?
Other than reviewing and testing your code to ensure that you are not making simple mistakes…
Quotas
We use various paid for APIs and we take the time to ensure that we have set up quotas and/or limits. This prevents a rogue piece of code, or loss of the API key, from costing a lot of money due to usage.
We have setup reasonable limits on all the usage cost APIs that we use. Our quota limits are based on the busiest day we can find for the API key. We use a multiplier to say what the maximum calls per day should be.
For example, if a website has caused 2,000 address look ups in a day, we set the limit to be 20,000. That allows the website to have a very busy day. We would get notified if the limit was approached on that busy day and would, once we had confirmed the usage was genuine, boost the limit up further. If not genuine then there’s only so much it can cost.
By default, the cost-per-usage APIs don’t have limits set. Those companies didn’t get rich by not charging people.
Actions
If you are using any sort of API (or service) that costs you money per usage, then go and set limits right now.
Monitor your usage. If you see an increase in usage, then you may want to review it.
Set up a policy to review that limit annually.