Part 1 – High availability cache

This is the first of a two-part blog, giving an introduction to the high availability cache at the front-end of the website.

The new responsive TfL Website is not just a mobile make-over, it has been re-developed from the ground up. The site is a fundamentally brand-new, structurally re-designed, responsive website for the modern needs of the travelling public in London.

High availability cache

Varnish is a web accelerator which allows our website to sustain very high traffic and load many times faster by caching static & dynamic content.

Living in a fast-moving age can sometimes make us impatient, especially so with the internet and website page loads – no-one wants to see the dreaded “buffer face” expression. That’s why we designed the new TfL website with a high availability front end cache (Varnish), partnered with an auto-scaling solution (via AWS cloud) for the back-end components. This formidable combination powers the TfL website, serving page loads quickly and repeat page loads even quicker, 24/7 and 365 days a year.

Varnish is a web accelerator which allows our website to sustain very high traffic and load many times faster by caching static & dynamic content. We use it in the new TfL website at the web application tier, to cache all incoming requests, and its capable of handling 2000 to 3000 requests per second.

Delivering content via cache is much faster than delivering it all the way down through the infrastructure from a web server at the back-end of the website. It’s rather like having to go to the back of the shop each time you want a sweet, better to have them stored up front because we know, and you know, that you’ll always want them!

What gives us performance gains is that the cache can store responses from the back-end for future use, quickly serving the next response directly from cache without placing any needless load on the back-end servers. The result is that the load on the back-end is significantly reduced, therefore response times improve, and more requests can be served per second. One of the things that makes the cache so fast is that it keeps its data completely in memory instead of hard disk.

This enables our web pages to load very quickly, and repeat page requests are typically served twice as fast, even at high-volume peak commute times. This is exactly what the travelling public wanted and what you’d expect from a top website.

varnish design

TfL.gov.uk high level design using a high availability cache

A lot of Londoners and visitors to London rely on tfl.gov.uk, and any outage can be a real pain, so for example, if there was problem with the back-end of the website, rather than having a “Sorry website is unavailable” holding page, we can serve up the last good version of the home page for a period, whilst we fix the back-end. That’s a great feature to have because we are a 24/7 service and need to be available 100% of the time.

In part 2 of this blog, I’ll look more at what we have done to help the back-end of our website cope better under load.

Posted by Tariq

Agile DevOps, Digital Transformation, Scrum Master, Product Owner, Product Manager at TfL Online

7 Comments

  1. What do you use to update varnish’s list of backends when a new AWS node comes online?

    Currently we don’t use any auto-scaling in our backends, so I’ve found using puppet to generate the VCL based on queries to puppetDB works well. However, we’re considering using auto-scaling at some point, and I’m worried that having to wait for the next puppet run before the VCL gets updated with new backends might take too long to take full advantage of auto-scaling.

    What are you using to solve this, and how long does it take before varnish starts sending traffic to a brand new backend?

    Like

    Reply

    1. Hello Luke, I double checked with our Solutions Architect (Dan Mewett) – We have developed a simple bash script that is ran by Cron on a regular basis, this examines if the generated fragment has changed from the last time it was ran and if so, reloads the VCL with the new back-end configuration. I’ll see if we can get a picture of the design uploaded above to the blog, thanks

      Like

      Reply

  2. Not quite sure what has happened to the website but the 108 bus appears to have disappeared from the website. The journey planner won’t use it and suggests a change of buses even when the 108 goes directly between two points and the 108 bus timetable no longer seems to be available on the website.

    Like

    Reply

    1. Andrew, that’s a good point – we’ll find out where it’s gone. It’s likely to be a data thing.

      Like

      Reply

      1. I’ve checked into this and it was a data issue. It has been corrected and will be deployed into the site shortly.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s