Enhancing resilience to overcome website downtime during severe weather events
Collaborating with the Met Office on the challenge of the National Severe Weather Warning Service’ website downtime, caused by spikes in internet traffic during severe weather events.
The opportunity
The Met Office National Severe Weather Warning Service (NSWWS) warns of the impacts caused by severe weather. The severity of the Met Office weather warnings is based on the level of impact and the likelihood of these impacts occurring. The combination of level of impact and likelihood results in a red, amber or yellow weather warning colour.
Red weather warnings are the most severe (high impact and likelihood) followed by amber and then yellow. The weather warnings could be issued for various conditions: for example, rain, thunderstorms, wind, snow, lightning, ice, extreme heat, and fog.
Typically the release of weather warnings can drive significant quantities of traffic to the Met Office website over a very short period of time, for example, 2018’s “Beast from the East”.
The challenge
Spikes in internet traffic caused significant strain on the Met Office’s public website. The existing solution was not able to meet demand quickly enough resulting in significant downtime with large parts of the website, including NSWWS, becoming unavailable.
How Made Tech helped
Working in partnership, the blended team identified a series of possible causes of the sub-optimal performance during weather warning initiated internet traffic spikes:
- Ineffective implementation of maps
- Overly complicated calls to the Content Management System (CMS)
- Manual scaling required by third-party CMS provider
- Delays and miscommunications about manual scaling caused misalignment between CMS resources and demand; for example, Kubernetes pods were reduced to ‘normal’ levels prematurely resulting in additional downtime
Improving website resilience
Improving website resilience during weather warnings was our primary concern. The understandably complicated requests from the maps were difficult to cache and rapidly drained available resources from the CMS making other parts of the website unresponsive.
After reviewing the implementation, we were able to simplify the requests originating in the maps and, therefore, improve the cacheability of the requests to the CMS. The improved cacheability reduced the number of calls that ‘missed’ the cache and needed to be served by the CMS during traffic spikes.
Building decoupled, auto-scaling resilience
The work to improve the resilience highlighted the tight coupling of the NSWWS pages with other, less critical parts of the website. Made Tech’s solution was to build a standalone NSWWS website decoupled from the main website.
We rebuilt the NSWWS website using our established best practices:
- Test Driven Development to assure code quality and service segregation
- A decoupled, hexagonal architecture
- Infrastructure as code, allowing the creation of consistent environments and track changes in version control
We also decided to develop the new NSWWS website using Kotlin because it is an ergonomic, JVM-based language that enabled faster development whilst remaining compatible with the Met Office’s existing Java ecosystem and expertise.
The new NSWWS website runs on its own independent, automatically scalable, AWS infrastructure to deliver resilience during internet traffic spikes.
The results
We released the new NSWWS website, with no noticeable impact to users, and our highly integrated approach means that the warnings appear as part of the main site.
Coping with the UK’s weather
To users, the new NSWWS website appears unchanged as part of the main site; however, it runs on its own independent, decoupled, auto-scaling infrastructure serving customers regardless of, and without impacting on, the performance of other parts of the website. Since going into Production, numerous weather warnings have been issued as storms battered the United Kingdom throughout the Winter ‘23-24.
The standalone service has seamlessly handled the increased load. In fact, the new solution has performed so well we are reducing the specifications of the underlying infrastructure and, therefore, the overall cost to the Met Office of providing the service.
The outlook moving forward
We’re working to ensure the NSWWS website continues to deliver for millions of users every day and we’re applying the success of this decoupling activity to other parts of the Met Office website (for example, forecast pages).
By reducing dependencies across the website, we’ll not only create services that are cheaper to run and easier to support, but we’ll deliver a robust and resilient public website to ultimately achieve the Met Office’s mission to help people ‘make better decisions to stay safe and thrive’ when it matters most.
Case studies
When digital services save lives: register an emergency beacon online
Helping the MCA build a quick and easy digital service for beacon owners to register and update their information.
A data platform review with the Department for Business and Trade
Working closely with the Department for International Trade to review their Data Hub, a self-built customer relationship management system.