Zareef Ahmed : Bigdata and Devops Consultant + Programmer for cloud

How to deal with cache nightmares?

Internet is the place where data is being generated at one place (usually servers) and then being served to the client through browsers and mobile apps. 

Lots of machines are involved in transferring this data from origin to destination. To improve user experience and to save computing resources, developers use caching. 

I will not go into details of the definition of cache as you already must to be aware of the cache if you are reading this article. To understand it better you can visit https://en.wikipedia.org/wiki/Cache_(computing)

In this article, I will walk you through how usually it works and how it is out of control of developers in some cases. I will also tell you, what you can do if things are not in your control. 

The cache can be generated at multiple levels.

I am dividing this levels into two broad categories.  The first category is the one which is normally in control of developers and another is usually not in direct control of developers.

In control of developers.

  1. DB level: Database engines can store query results in the query cache. These caches are usually time-bound, so they will delete themselves. A change in query is also a way to overcome them. This is usually not a big problem with applications. 
  2. Custom Cache: An application developer may devise his or her own custom cache engine and may store cache in a preferable engine.  They can use traditional text files, memory and some off the shelf solutions like Redis, Memcache, etc. To bust this kind of cache, you need to consult the respective developer. 
  3.  Web Server Cache: Web server can also do cache based on the configuration of the web server. Some certain types of contents can be cached. You can usually bust this kind of cache by just doing a restart or manually deleting content from cache folders.  Consult the respective web server manual for more details. 
  4. CDN: Some applications use content delivery networks with edge locations around the world.  These kinds of cache are governed by some rule book setup by architects or developers. This kind of cache (if present) can be dealing with the help of respective CDN documentation. 

Not in control of developers 

  1. ISP Edge location: Internet Service Providers do a great level of data optimization by doing a cache of static content across their network. These edge locations are around the world. If the content is regularly being used by its consumers, it tries to keep a copy of that content at a location near to users.  These kinds of cache are difficult to bust. There is only two way available : –
    1. Setting expires headers in your content: These ISP usually honor this header, but it is only advisory, they may not implement it programmatically. It is recommended to use a expiry header, but you should not rely on it. 
    2. Changing the URL/URI of the resource: This is a sure-shot way. In this case, your content will be treated as new. 
  2. Company or organization proxy servers: Many organizations use their own proxy servers which also acts as a cache server. These servers also needs to be handled as Edge locations. Changing the URL/URI is the best way to deal with them. 
  3. Browser Cache: Browsers also do cache to improve user experience. Browsers also honor expires headers. You should use them, but if you are doing a new website release this cache will become a big issue. This cache can be deleted from your browsers. But, and this is a big BUT, you should not rely on cache deletion from the browser to release your new website. Expecting every user of your website to bust their cache, is a recipe for disaster.  

What you should do when the cache is not in your control?

You can easily deal with the cache when it is under your control. You need to come up with a strategy when it is not in your control.  Let me tell you what industry to do handle these cache nightmares. 

  1. Implementation of Etags: This is an industry-standard to deal with data that is not changing. You can implement this to improve the user experience. This is usually a good way to handle dynamic data. 
  2. URL/URI Change: This is the most used and only effective way of dealing with static data like CSS, JavaScript, and images.  

Let me explain with an example:



Suppose you have a stylesheet that is being referred to in your HTML page.

<html>

<head>

<link rel=”stylesheet” type=”text/css” href=”theme.css”>

</head>

</html>




 Next time, whenever you are doing a change in your theme.css, just append version number. 

<html>

<head>

<link rel=”stylesheet” type=”text/css” href=”theme.css?v=2″>

</head>

</html>

By doing so, browsers, proxy servers, and edge locations will treat this as a new URI and will fetch new data from the server. You can do the same with Javascript files.  Images files should also be treated the same way. 

How should I do the version number for CSS and Javascript?

It’s entirely up to you, I recommend using an incrementing number, whenever someone is doing a change in this file, they should increase the number, which should be reflected at the places whenever this file is being used. Some people just use a random number, I do not recommend using random numbers, in this way, the content cache will never be used.