The full story of the Google Cloud account suspension incident that caused the entire Railway to be down for approximately 8 hours.



Railway, which provides the infrastructure for application deployment and operation, experienced a major outage lasting approximately eight hours, from around 7:20 AM to 3:14 PM JST on May 20, 2026. According to Railway, the cause was that Google Cloud mistakenly suspended Railway's production account.

Incident Report: May 19, 2026- GCP Account Suspension

https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage

Railway Service Disruption — Resolved | Railway Status
https://status.railway.com/incident/I23M92U0



Railway operates its services using a combination of Google Cloud, AWS, and its own bare-metal infrastructure called Railway Metal. However, because some of its dashboards, APIs, control plane, databases, and compute infrastructure were located on Google Cloud, the suspension of its Google Cloud account rendered Railway's core functions unavailable.

Immediately after the outage, workloads on Google Cloud stopped, and 503 errors appeared on the Railway dashboard and API. Users were unable to log in or deploy, and at its peak, Railway workloads across all regions became unreachable.

The reason the impact spread beyond Google Cloud is that Railway's network control plane depended on Google Cloud. Railway's edge proxies use routing tables to determine which application to send incoming traffic to. A routing table is like a correspondence table that says, 'Traffic to this URL should be sent to this execution environment.'

While Railway Metal and the workloads on AWS continued to operate, the control plane that distributes routing tables went down on Google Cloud, preventing edge proxies from determining the correct destination. As a result, the outage on Google Cloud spread to the entire Railway network.



Railway detected an API health check failure at around 7:10 AM and identified the cause as a temporary suspension of its Google Cloud account at around 7:19 AM. Although access to the Google Cloud account was restored around 7:29 AM, it took time to recover the stopped compute instances and persistent disks. Network traffic began to recover around 10:38 AM, and Railway processed the queue in stages to avoid a surge in deployments.

Railway ultimately confirmed the full recovery of its services around 3:14 PM and moved into the monitoring phase, updating the status to resolved around 4:57 PM. Railway has automatically redeployed some workloads, and is instructing users to manually redeploy as needed.

As a preventative measure, Railway has indicated its intention to eliminate its single dependency on the network control plane on Google Cloud. Specifically, it will expand the deployment of its high-availability database to AWS and Railway Metal as well, and migrate to a configuration that can continue services even if instances on a specific cloud become unavailable collectively.

Furthermore, plans have been outlined to reduce reliance on Google Cloud for critical user communication routes and limit its use to secondary or failover purposes. Railway states that 'the service visible to users is Railway, not Google Cloud, so the responsibility for availability, including vendor selection, lies with Railway.'

in Web Service, Posted by log1d_ts