Major Cloud Providers – Monthly Outage Recap July

A New Monthly Scorecard

The Lowdown on Downtime

At Infinitely Virtual, 100 percent uptime is not only aspirational, it’s a real thing – a promise to our customers that we keep.  One hundred percent uptime takes work.  In our case, these principles guide our operations:

  • Our environment is covered by 100 percent uptime SLA, with money back if an outage occurs.
  • Our system is configured so no single point of failure exists.
  • Our planned maintenance of the infrastructure is limited to one time per year.

Every month from now on, we’ll post a snapshot of how the bigger cloud providers are managing the issue of outages and downtime. 

MS Azure

7/31/18 Storage – North Central US Between 17:52 UTC and 18:40 UTC on 31 Jul 2018 a subset of customers using Storage in North Central US may have experienced difficulties connecting to resources hosted in this region. Other services that leverage Storage in this region may also have been experiencing impact related to this.

7/28/18 App Service – East US – Mitigated Between 08:00 and 12:30 UTC on 28 Jul 2018, a subset of customers using App Services in East US may have received HTTP 500-level response codes, have experienced timeouts or high latency when accessing App Service

7/26/18 Azure Service Management Failures Between 22:15 on 26 Jun 2018 and 06:20 UTC on 27 Jun 2018, a subset of customers may have experienced timeouts or failures when attempting to perform service management operations for Azure Compute Resources.

7/24/18 Delays in ARM generated Activity Logs Between 23:00 UTC on 24 Jul 2018 and 07:00 UTC on 27 Jul 2018, customers may have not received activity logs for ARM resources.

7/23/18 Error notifications in Microsoft Azure Portal Between approximately 20:00 and 22:56 UTC on 23 Jul 2018, a subset of customers may have received timeout errors or failure notifications when attempting to load multiple blades in the Microsoft Azure Portal. Customers may also have experienced slowness or difficulties logging into the Portal.

7/16/18 RCA – Networking – Service Availability Between 14:55 and 16:15 UTC on 16 July, 2018, a subset of Azure customers in West US, West Central US, India South, India Central, and Australia East may have experienced difficulties connecting to Azure endpoints, which in-turn may have caused errors when accessing Microsoft Services in the impacted regions.

7/16/18  Problems at Microsoft Azure Microsoft Azure is having issues since 11:33 AM EDT Most reported problems:

  • Website hosting (43%)
  • Virtual machines (37%)
  • Cloud services (18%)

7/11/18  RCA – SQL Database – Intermittent Database Login Failures – South Central US Between 18:38 on 11 Jul 2018 and 01:13 UTC on 12 Jul 2018, a subset of customers using SQL Database in South Central US may have experienced intermittent issues accessing services.

7/1/18 RCA – IoT Hub – Connectivity Issues Between 01:30 UTC on 01 Jul 2018 and 18:00 UTC on 02 Jul 2018, a subset of customers using Azure IoT Hub in West Europe, North Europe, East US, and West US may have experienced difficulties connecting to resources hosted in these regions.

AWS

1:51 PM PDT We are currently experiencing intermittent errors accessing the AWS Management Console. AWS services are operating normally.

2:10 PM PDT We are currently experiencing intermittent errors accessing the AWS Management Console when using root account login credentials. The underlying AWS services and console logins using IAM users are operating normally.

7/16/18 Amazon says Prime Day issues unrelated to AWS

Problems first arose at 3:04 pm Eastern—just four minutes after Amazon Prime Day officially kicked off on Monday

DownDetector

7/16/18 Problems at AWS

Amazon Web Services is having issues since 3:12 PM EDT Most reported problems:

  • Log-in (50%)
  • EC2 (30%)
  • S3 (19%)

DownDetector

7/11/18 Problems at AWS

Amazon Web Services is having issues since 8:24 PM EDT

Other Platforms

7/31/18 VMware Cloud on AWS Service Intermittent Availability Issue 07:25 AM UTC Users may not be able to perform some operations on VMware Cloud on AWS or service (including HCX and NSX) running on VMware Cloud on AWS. This issue does not impact new SDDC deployments. 

7/31/18 Google Cloud Networking Incident #18014 Traffic loss in region europe-west2

7/30/18 IBM Cloud Issues with staging CloudFoundry applications 10:24 PM PDT Users are experiencing issues when staging new CloudFoundry applications or restaging existing ones. Running applications are not impacted.

Intermittent issues accessing platform services/endpoints 5:47 AM PDT Customers were experiencing issues when accessing to running Cloud Foundry applications, with Cloud Foundry application management functions, Authentication/Login, Service dashboards and provisioning.

Intermittent issues accessing various platform services/endpoints 4:25 AM PDT Impact:

  • Access to running Cloud Foundry applications
  • Cloud Foundry application management functions
  • Authentication/Login
  • Service dashboards and provisioning

7/30/18 SAP Cloud US East (Ashburn) [neo-us1] – Service Advisory Our monitors have indicated a possible disruption on the US East (Ashburn) [neo-us1] region of the SAP Cloud Platform (us1.hana.ondemand.com) impacting the availability of applications and services.

7/29/18 IBM Cloud Intermittent application staging errors and access to running applications 9:44 AM PDT Impact:

  • Intermittent inability to staging/re-stage Cloud Foundry applications
  • Intermittent inability to access Cloud Foundry applications via HTTP

7/27/18 Google Cloud Networking Incident #18013 We are investigating issues with Internet access for VMs in the europe-west4 region.

7/26/18  SAP US East (Ashburn) [neo-us1] – Service Advisory Our monitors have indicated a possible disruption on the US East (Ashburn) [neo-us1] region of the SAP Cloud Platform

7/26/18  IBM Cloud – Issues with Cloud Foundry API, application management, and access to running applications

7/25/18 Problems at 1and1

1and1 is having issues since 12:40 PM EDT. Most reported problems:

  • Email (60%)
  • Hosting (40%)

7/24/18  SAP – Europe (Rot) [neo-eu1] – Service Advisory Our monitors have indicated a possible disruption on the Europe (Rot) [neo-eu1] region of the SAP Cloud Platform

7/24/18  IBM – Login and Cloud Foundry application management issues Cloud Foundry application management is unavailable (push and restage actions)

7/24/18 Problems at Go Daddy Go Daddy is having issues since 9:37 AM EDT. Most reported problems:

  • Email (50%)
  • Domains (26%)
  • Hosting (22%)

7/24/18 IBM – Intermittent issues creating Cloud Foundry assets for new users Some users who have recently registered for a new account experience errors when creating Cloud Foundry applications or services.

7/23/18 VMware Cloud backend service Intermittent Availability issue Impact: User may not be able to access the service or experiencing trouble when logging into the service. 

7/20/18 VMware Cloud backend service Availability issue VMware Cloud backend services are experiencing Availability issue.

7/19/18 VMware Cloud backend service Availability issue Access to VMware Cloud Services from India may be impacted. Users in India may not be able to access the consoles for VMware Cloud Services.

7/17/18 VMware Kubernetes DNS resolution availability issue Impact: DNS resolution fails within newly create clusters, existing clusters are expected to work fine.
Jul 17, 00:33 UTC

7/17/18 Google Cloud Support Incident #18002 Incident began at 2018-07-17 12:17 and ended at 2018-07-17 12:55

7/17/18 Google Cloud Networking Incident #18012 Incident began at 2018-07-17 12:15 and ended at 2018-07-17 13:05 We are investigating a problem with Google Cloud Global Load balancers returning 502s

7/17/18 Problems at 1and1

1and1 is having issues since 10:20 AM EDT.

7/16/18 |SAP Cloud Platform US East (Sterling) [neo-us3] – Service Advisory Our monitors have indicated a possible disruption on the US East (Sterling) [neo-us3] region of the SAP Cloud Platform

7/16/18 Problems at CenturyLink CenturyLink is having issues since 1:08 PM EDT Most reported problems:

  • Internet (64%)
  • E-mail (22%)
  • Phone (12%)

7/11/18 Problems at CenturyLink CenturyLink is having issues since 5:04 PM EDT. Most reported problems:

  • Internet (86%)
  • Total blackout (7%)
  • Phone (5%)

7/9/18 Problems at iCloud iCloud is having issues since 12:29 PM EDT