AWS, Cloud Platform Services
AWS outage 2025: Why your business needs a multi-region cloud strategy
As a cloud partner, we love the cloud and believe it’s the future. But it’s important to also look at the less positive side of things and get a bit of reality check.
The morning of October 20, 2025, gave everyone a reminder of how the modern internet actually works.
A problem in just one Amazon Web Services region affected hundreds of apps and websites around the world, from Snapchat and Roblox to major banks like Lloyds Banking Group. The event showed exactly how much money businesses lose when services go down, and revealed how connected all our digital services really are. Nothing is perfect, not even the cloud.
And it wasn’t a random glitch either; it was a specific technical problem that spread globally, more specifically, a DNS resolution issue in the critical US-EAST-1 region. For businesses, it’s a warning that means you should look closely at how your own systems work. We can’t just focus on keeping services running anymore, but need to build systems that stay working even when something goes wrong.
The problem shows how much we depend on the cloud
On October 20, 2025, something went wrong at Amazon Web Services (AWS), and suddenly websites and apps started failing all over the world.
The problem started in the morning (UK time), and it quickly became clear how much our digital world depends on these cloud services. Hundreds of companies couldn’t serve their customers, and millions of people couldn’t use the services they rely on every day.
Major services stopped working worldwide
The effects of the problem spread quickly to almost every type of business. Downdetector, a website that tracks when services aren’t working, recorded millions of people reporting problems with more than 500 different companies. This showed just how many services depend on AWS to keep running.
Impact on social and gaming: Snapchat, Roblox, and Fortnite
Some of the biggest services that stopped working were social media and games. People reported that Snapchat, Roblox, and Fortnite were either completely down or having serious problems. Roblox games showed as unavailable, and Fortnite players couldn’t log in at all.
Banks and money services: Lloyds Bank, Halifax, and Coinbase
Banks and financial services were hit too. Lloyds Banking Group, which includes Halifax and Bank of Scotland, confirmed their services weren’t working because of the AWS problem. Many customers couldn’t check their accounts online or through their banking apps. Coinbase, where people buy and sell cryptocurrency, also reported that users couldn’t access their accounts.
Work tools and government services: Slack, Canva, and HMRC
The tools many of us use for work were affected as well. Slack (for team messages) and Canva (for creating designs) had problems, affecting businesses around the world. Even government services in the UK were hit, with HMRC (the tax authority) confirming that its online services weren’t working properly and its phone lines were jammed with calls from people trying to get help.
Some affected services
The complete list of impacted services is huge, covering almost every sector. Reports from Downdetector and company statements confirmed problems for a wide range of platforms. Here’s just a sample of the well-known services that had problems:
-
Social Media & messaging: Snapchat, Reddit, Signal, Slack, Zoom, Facebook, WhatsApp, Tinder
-
Games & entertainment: Roblox, Fortnite, PlayStation Network, Epic Games Store, PokemonGo, Disney+, Hulu, Tidal, Crunchyroll, Apple TV
-
Banking & money: Lloyds Bank, Halifax, Bank of Scotland, Coinbase, Robinhood, Venmo, Chime
-
Work & business tools: Canva, Perplexity AI, Asana, Airtable, Xero, Smartsheet
-
Amazon’s own services: Amazon.com (shopping), Alexa, Prime Video, Ring doorbells
-
Government & utilities: HMRC (UK), Gov.uk, T-Mobile, Verizon
-
Other big names: McDonald’s (app), Duolingo, Wordle, The New York Times, Lyft, United Airlines, Delta Air Lines
The real business cost when digital services fail
The number of affected services shows this wasn’t just a technical problem—it was a major business problem.
A reminder of how outages hurt business
This incident shows how directly digital services affect business operations. For online stores, banks, and app-based services, downtime directly means lost money, stopped operations, and disruption to the entire business. Airlines reported customers couldn’t check in, and Post Office branches couldn’t process Amazon Click and Collect packages.
How outages affect customer trust and daily life
Beyond just losing money right away, widespread outages damage how much customers trust a service. The problem affected things people use every day, like Amazon’s Alexa assistant and Ring doorbells.
Users reported seeing “failed to connect” errors that meant they couldn’t see live video from their doorbells. Other people had their cards declined when trying to pay for things, showing how deeply these services are now part of our daily lives.
What actually went wrong
To understand why so many services stopped working, we need to look at where the problem started. This wasn’t a complete failure of all AWS systems. It was a specific problem in one important location that then spread to affect services around the world.
The problem started in Virginia (US-EAST-1)
Amazon’s status updates quickly showed that the problem was in just one place: the US-EAST-1 Region, which is located in Northern Virginia. This region is one of the oldest and largest AWS locations, which means many services and applications depend on it in some way, even if their main systems are somewhere else.
Amazon reports issues related to DNS resolution
Amazon said the problem “appears to be related to DNS resolution.” DNS (Domain Name System) works like a phone book for the internet. When you type in a web address, DNS translates that name into the actual numerical address (IP address) that computers use to find each other. When DNS has problems, applications can’t find the services they need to work properly.
One database service caused a chain reaction
The DNS problem specifically affected “the DynamoDB API endpoint in US-EAST-1.” DynamoDB is a database service that many applications use to store and retrieve data. An API endpoint is simply the address that applications use to talk to that service.
When this one address couldn’t be found because of the DNS problem, it created a bottleneck that stopped any application using it from working properly — whether they were trying to get user data, process logins, or update game information.
How one small problem spread so widely
This incident clearly shows what happens in a chain reaction failure, where a problem in one small part triggers problems across a much larger system.
Many global apps actually depend on US-EAST-1
Many applications that seem to be spread across multiple regions still have hidden dependencies on services in US-EAST-1. Even if a company’s main services are in Europe or Asia, they might use a global service (like user authentication) that itself depends on US-EAST-1. This single point of failure meant that teams around the world, including those at UK banks, were affected by a problem in Virginia.
It got worse before it got better
As Amazon’s engineers worked to fix the DNS issue, two more problems made things worse:
-
A huge “backlog of queued requests” built up. When the system started working again, it had to handle all the failed and repeated connection attempts at once, which slowed everything down.
-
Amazon couldn’t launch new “EC2 instances” (virtual servers) in the affected region. This meant that systems designed to automatically add more capacity during problems couldn’t do so, making recovery even harder for affected companies.
Why the internet depends on just a few big companies
This event is just the latest example showing how much the modern internet depends on a small number of large infrastructure providers. While this setup offers benefits like lower costs and more features, it also means that a problem in one place can affect services worldwide.
This isn’t the first time a single problem has caused widespread failures. Similar incidents have happened before, following the same pattern: a local technical issue at one major provider caused global downtime for its customers.
Other examples of major outages
Experts have been warning about this growing interdependence for years. In July 2024, a software update from cybersecurity company CrowdStrike caused a massive worldwide disruption that grounded flights and stopped business operations.
In October 2021, a configuration mistake brought down all of Meta’s services (Facebook, Instagram, WhatsApp). And in June 2021, a bug at content delivery network Fastly took down many major websites.
Why we all use the same cloud services
The cloud model is efficient because many organizations can use the same underlying services. AWS calls itself the “world’s most comprehensive and broadly adopted cloud.”
This approach lets millions of companies use advanced, scalable infrastructure without building everything themselves. The downside is that when a core service in a major region like US-EAST-1 has a problem, the effects spread far and wide.
Focus on building systems that can handle failures
This event offers an important lesson, suggesting that organizations need to change how they think about their cloud setup — moving from just trying to avoid downtime to building systems that can keep working even when parts fail.
Accepting that problems will happen sometimes
As cloud systems get more complex, small human errors or misconfigurations (like the DNS issue in this outage) can have big impacts. Experts point out that these kinds of problems will happen in any large system. So a good cloud strategy needs to assume that regional disruptions will occur occasionally.
Why you need services in multiple regions
The main point is that we need to build toughness into our apps from the start. This problem shows why it’s risky to rely on just one AWS region. A tough system can keep working during a regional failure by putting services in different regions and automatically sending traffic to working regions when there’s a problem. This way, if one region goes down, your business can keep running from another region.
Build systems that don’t break when one part fails
The biggest lesson from this widespread disruption is the need for proactive, resilient design. Instead of just reacting to outages, organizations can build systems designed to handle them. This event should push leaders to make resilience a core part of their business strategy.
Important questions to ask about your own systems
This incident should make you look carefully at how your own systems are set up. Business and tech leaders should ask these basic questions to see if they’re ready for problems:
Are your important services all in one AWS region?
The US-EAST-1 problem showed the danger of “hidden connections,” where even global apps depend on a single region for key functions. You should carefully check to find all services and data paths that could stop working if one region fails.
Can your systems switch regions automatically when problems happen?
If your primary region becomes unresponsive, can your traffic automatically and seamlessly reroute to a secondary, healthy region? A resilient architecture isn’t just about being in multiple regions; it’s about having robust automation that manages the failover process without requiring human intervention.
When was your disaster recovery plan last tested?
A disaster recovery (DR) plan that only exists on paper isn’t a plan at all. Regular, real-world testing of these failover mechanisms — often called “chaos engineering” — is the only way to ensure they’ll work during a high-stress, real-world event.
How a cloud partner like Revolgy can help
Answering these questions and putting solutions in place is hard work that never really ends. This is where having an expert cloud partner makes a big difference, helping you prevent problems instead of just reacting to them.
Building systems that work in multiple regions
A partner like Revolgy can help design and build truly tough, multi-region systems. We look at how your apps are connected and how your infrastructure is set up to remove single points of failure and make sure your services can handle regional outages.
Keeping everything running
Building tough systems isn’t something you do once and forget about. It takes constant watching, managing, and improving to stay ahead of new challenges. Revolgy’s Cloud Operations team provides the technical know-how to manage your cloud environment and keep it both working and cost-effective.
Making plans a reality
In the end, an expert partner brings both technical knowledge and real-world experience to solve these complex problems. With flexible billing, special partner benefits, and services from building and moving to the cloud to keeping it secure, a partner makes the cloud journey simpler and lets your team focus on what you do best, knowing that your systems can handle problems when they happen.
Contact us today for a free consultation.