Major Cloud Providers – Monthly Outage Recap October

MS Azure

10/30/19 RCA – Storage – West EuropeBetween 10:17 and 13:52 UTC, a subset of customers using Storage, or Azure services with Storage dependencies, in West Europe may have experienced difficulties connecting to resources hosted in this region.

10/25/19

Azure Databricks and Data Factory v2 – Workspace and Data Flow ErrorsBetween approximately 11:00 and 14:40 UTC, a subset of customers using Azure Databricks may have received ‘No Web App available’ error notifications when logging into a Databricks workspace. Related API calls may have also not returned a response. Additionally, a very limited subset of customers using Data Factory v2 may have received failure notifications for Data Flow jobs.

-DownDetector Problems at Microsoft AzureMicrosoft Azure is having issues since 10:29 AM EDT.Most reported problems

 42% Cloud services

 40% Website hosting

 17% Virtual machines

10/22/19

RCA – Service Availability – West EuropeBetween 21:15 UTC on 22 Oct 2019 and 00:23 UTC on 23 Oct 2019, a subset of customers using Storage in West Europe may have experienced service availability issues. In addition, resources with dependencies on the impacted storage unit may have experienced downstream impact in the form of availability issues or high latency.

Azure Portal Sign In Failure – West EuropeBetween 06:40 UTC and 09:54 UTC a subset of customers were identified as having experienced issues signing in to the Azure Portal in West Europe.

10/21/19

RCA – Storage – West EuropeBetween 23:20 UTC on 21 Oct 2019 and 04:32 UTC on 22 Oct 2019, a subset of customers using Storage in West Europe may have experienced service availability issues. In addition, resources with dependencies on the impacted storage unit may have experienced downstream impact in the form of availability issues or high latency.

RCA – Service Management Errors – East US 2Between 10:08 and 23:37 UTC, a subset of customers in East US 2 may have received error notifications or experienced high latency when performing service management operations – such as create, update, or delete – for resources hosted in this region. Additionally, customers using Azure Databricks and/or Data Factory v2 may have encountered service management errors in multiple regions. A very limited subset of customers using Virtual Machines with SQL Server images, or other SQL IaaS offerings, may have also encountered errors performing service management operations on resources hosted in East US 2.


10/18/19
RCA – Authentication issues with Azure MFA in North AmericaBetween 13:30 UTC and 15:57 UTC, customers in North America experienced issues receiving multi-factor authentication (MFA) challenges. Users who had valid MFA claims during the incident were not impacted. However, users who were required to perform an MFA challenge during this incident were unable to complete the challenge. This represented 0.51% of users in North American tenants using the service during the incident.

DownDetector

10/18/19 Problems at Microsoft AzureMicrosoft Azure is having issues since 10:05 AM EDT. Most reported problems

 60% Cloud services

 35% Website hosting

3% Virtual machines10/16/19 RCA – Azure PortalBetween 13:45 and 14:59 UTC (approx.), a subset of customers may have experienced latency issues with the Azure Portal, Command Line, and Azure PowerShell.

AWS

10/24/19 VMware Cloud on AWS: Host Remediation Degradation VMware Cloud on AWS (VMC) experienced an operational issue that caused us to inadvertently remediate hosts that had not actually failed. Customers may have noticed an unusual amount of host replacement activity due to this error. This incident affected: VMware Cloud on AWS

Start Time (UTC): October 24, 2019, 22:20 hours

End Time (UTC): October 25, 2019, 22:25 hours

Duration: 24hrs 5mins

DownDetector

10/23/19 Problems at Amazon Web Services Amazon Web Services is having issues since 8:48 AM EDT.Most reported problems

 63% S3

 29% AWS Console

 6% Route53

DownDetector

10/22/19 Problems at Amazon Web Services Amazon Web Services is having issues since 2:56 PM EDT. Most reported problems

 73% S3

 18% AWS Console

 8% Route53

10/15/19 VMware Cloud on AWS: SDDC Capacity issue in Asia Pacific Southeast Region New SDDC provisioning may fail for Asia Pacific South East (Singapore). This incident affected: VMware Cloud on AWS SDDC (Asia Pacific (Singapore)).

Start Time: October 15, 2019, 08:32 UTC
End Time: October 18,2019, 07:51 UTC

10/14/19 VMware Cloud on AWS: SDDC provisioning failures in US West (Oregon) Region New SDDC provisioning may fail in US West (Oregon). This incident affected: VMware Cloud on AWS SDDC (US West (Oregon)).

Start Time: October 14, 2019 07:56 UTC
End Time: October 15, 2019 07:08 UTC

10/10/19 VMware Cloud on AWS: SDDC provisioning failures in US East (N. Virginia) region New SDDC provisioning or adding hosts/clusters may fail in US West (Oregon). This incident affected: VMware Cloud on AWS SDDC (US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney)).

Start Time: October 10, 2019 18:00 UTC
End Time: October 10, 2019 15:14 UTC

10/5/19

VMware Cloud on AWS: Unable to access the Network and Security tab Customers will not be able to view Network and Security tab. This incident affected: VMware Cloud on AWS (VMware Cloud on AWS).

Start Time: October 05, 2019 04:20 UTC
End Time: October 05, 2019 11:20 UTC

VMware Cloud on AWS Service Performance Degradation issue User may not be able to access the service or may experience trouble while accessing the service. This incident affected: VMware Cloud on AWS (VMware Cloud on AWS, DRaaS, HCX).

Start Time: October 05, 2019 01:00 UTC
End Time: October 05, 2019 01:34 UTC

10/4/19

-Amazon CloudWatch (Oregon) Elevated Error Rates Starting at 5:27 PM PDT, there were delays delivering scheduled CloudWatch events. Recovery began at 7:36 PM PDT, and by 9:22 PM PDT the backlog of all previously scheduled CloudWatch events was successfully delivered.

-Amazon CloudWatch (N. Virginia) Elevated Error Rates Starting at 5:27 PM PDT, there were delays delivering scheduled CloudWatch events. Recovery began at 7:31 PM PDT, and by 9:28 PM PDT the backlog of all previously scheduled CloudWatch events was successfully delivered.

10/2/19 VMware Cloud on AWS: SDDC provisioning failures New SDDC provisioning or adding hosts/clusters may fail in US West (Oregon). This incident affected: VMware Cloud on AWS SDDC (US West (Oregon)).

Start Time: Tuesday, 1 October 2019, 22:28 UTC
End Time: Wednesday, 2 October 2019, 20:56 UTC

Other Platforms

10/31/19

-Google Cloud infrastructure components Incident #19001 Our engineers have determine issue to be linked to a single Google incident. Incident began at 2019-10-31 16:41 and ended at 2019-11-02 10:00 (US/Pacific).

-Google Cloud Composer Incident #19003 Our engineers have determined that Cloud Composer was impacted by the same underlying issue as the Google Compute Engine (GCE) incident. Incident began at 2019-10-31 16:41 and ended at 2019-11-01 12:41 (US/Pacific).

-Google Cloud Networking Incident #19021 Our engineers have determined that Google Cloud Networking was impacted by the same underlying issue as the Google Compute Engine (GCE) incident. Incident began at 2019-10-31 16:41 and ended at 2019-11-02 10:00 (US/Pacific).

-Google Cloud Machine Learning Incident #19002 Our engineers have determined that Cloud ML Engine was impacted by the same underlying issue as the Google Compute Engine (GCE) incident. Incident began at 2019-10-31 16:41 and ended at 2019-11-01 09:00 (US/Pacific).

-Google Cloud Memorystore Incident #19002 Our engineers have determined that Cloud Memorystore was impacted by the same underlying issue as the Google Compute Engine (GCE) incident. Incident began at 2019-10-31 16:41 and ended at 2019-11-01 07:41 (US/Pacific).

-Google Cloud DNS Incident #19003 Our engineers have determined that Google Cloud DNS was impacted by the same underlying issue as the Google Compute Engine (GCE) incident.Incident began at 2019-10-31 16:41 and ended at 2019-11-01 08:15 (US/Pacific).

-Google Cloud Datastore Incident #19005 We’ve received a report of an issue with Cloud Datastore / Firestore in us-central. Incident began at 15:30 and ended at 16:01 (US/Pacific).

10/27/19 VMware Cloud Services Functionality Issues Intermittent issues with functionality of our VMware Cloud Services.

Start Time: October 27, 2019 21:48 UTC
End Time: October 28, 2019 01:50 UTC

10/24/19 VMware Cloud on AWS: Host Remediation Degradation VMware Cloud on AWS (VMC) experienced an operational issue that caused us to inadvertently remediate hosts that had not actually failed. Customers may have noticed an unusual amount of host replacement activity due to this error.

Start Time (UTC): October 24, 2019, 22:20 hours

End Time (UTC): October 25, 2019, 22:25 hours

Duration: 24hrs 5mins

10/22/19

-SAP Cloud Platform Europe (Netherlands) [cf-eu20] – Service Advisory Since approximately 21:13 UTC on 22 Oct 2019 until 03:04 UTC on 23 Oct 2019 Lifecycle management operations cannot be executed. Application logs in Kibana are unavailable.

-SAP Cloud Platform Europe (Netherlands) [cf-eu20] – Service Advisory Since approximately 23:20 UTC on 21 Oct 2019 until 04:42 UTC on 22 Oct 2019 Data processing for IoT message producers and consumers is not possible

-Google Cloud Bigtable Incident #19001 Our engineers have determined this issue to be linked to the Google Cloud Networking incident in us-west1-b. Incident began at 17:04 and ended at 20:04 (US/Pacific).

-Google Cloud Networking Incident #19020 Google Compute Engine experienced 100% packet loss to and from ~20% of instances in us-west1-b for a duration of 2 hours, 31 minutes. Additionally, 20% of Cloud Routers, and 6% of Cloud VPN gateways experienced equivalent packet loss in us-west1. Incident began at 16:47 and ended at  21:35  (US/Pacific).

-Cloud Memorystore Incident #19001 Our engineers have determined this issue to be linked to the Google Cloud Networking incident in us-west1-b. Incident began at 17:14 and ended at 19:05 (US/Pacific).

-Google Cloud Storage Incident #19008 Incident began at 17:11 and ended at 20:04  (US/Pacific).

10/20/19 VMware Cloud PKS Performance Degradation issue Customers may experience delays or failures in creating or deleting clusters in all regions.

Start Time: October 20, 2019 17:05 UTC
End Time: October 20, 2019 18:33 UTC

10/19/19 SAP Cloud Platform US East (Ashburn) [neo-us1] – Service Advisory Since approximately 07:32 UTC until 08:11 UTC Git Service and HTML5 application deployment are unavailable

10/18/19

-SAP Cloud Platform US East (Ashburn) [neo-us1] – Service Advisory Since approximately 22:03 UTC until 23:02 UTC Forms by Adobe is unavailable

-SAP Cloud Platform US East (Ashburn) [neo-us1] – Service Advisory Since approximately 14:51 UTC until 18:31 UTC A general disruption is impacting the availability of applications and services

10/15/19

VMware Cloud on AWS: SDDC Capacity issue in Asia Pacific Southeast Region New SDDC provisioning may fail for Asia Pacific South East (Singapore).

Start Time: October 15, 2019, 08:32 UTC
End Time: October 18,2019, 07:51 UTC

VMware Skyline Service Availability issue Users may not be able to access the service or may experience trouble while accessing the service.

Start Time: October 15, 2019, 03:48 UTC
End Time: October 15, 2019, 06:54 UTC

10/14/19

VMware Cloud on AWS: SDDC provisioning failures in US West (Oregon) Region New SDDC provisioning may fail in US West (Oregon).

Start Time: October 14, 2019 07:56 UTC
End Time: October 15, 2019 07:08 UTC

-SAP Cloud Platform Japan (Tokyo) [neo-jp1] – Service Advisory Since approximately 02:01 UTC until 03:12 UTC Lifecycle operations for HTML5 applications are not possible. Portal and WebIDE services were unavailable.

10/12/19 VMware Cloud Services Performance Degradation with VMware Log Intelligence Customers can expect to see log ingestion delays ranging from 30 to 90 minutes.

Start Time: October 11, 2019 21:38 UTC
End Time: October 12, 2019 02:15 UTC

10/11/19

SAP Cloud Platform US East (Sterling) [neo-us3] – Service Advisory Since approximately 11:50 UTC until 13:38 UTC A general disruption is impacting the availability of applications and services

SAP Cloud Platform Europe (Frankfurt) [cf-eu10] – Service Advisory Since approximately 05:18 UTC until 10:45 UTC Lifecycle management operations for Java applications cannot be executed. Customers may not be able to deploy, undeploy their application.

10/10/19

VMware Cloud on AWS: SDDC provisioning failures in US East (N. Virginia) region New SDDC provisioning or adding hosts/clusters may fail in US West (Oregon). 

Start Time: October 10, 2019 18:00 UTC
End Time: October 10, 2019 15:14 UTC

-SAP Cloud Platform Europe (Frankfurt) [neo-eu2] – Service Advisory Since approximately 07:30 UTC until 08:32 UTC Lifecycle management operations for Java applications cannot be executed. Customers may not be able to deploy, undeploy their application.

-SAP Cloud Platform Europe (Frankfurt) [neo-eu2] – Service Advisory Since approximately 05:38 UTC until 06:00 UTC Customers may not be able to deploy, undeploy their application. Lifecycle management operations for Java applications cannot be executed

-SAP Cloud Platform Singapore

[cf-ap11]

– Service Advisory Since approximately 00:39 UTC until 04:29 UTC Applications protected by Authorization & Trust Management (XSUAA) are not accessible

10/8/19 SAP Cloud Platform Japan (Tokyo) [neo-jp1] – Service Advisory Since approximately 13:24 UTC until 13:47 UTC A general disruption is impacting the availability of applications and services

10/7/19

VMware Cloud PKS Performance Degradation issue Users might experience issues while creating cluster in US-East Region (North Virginia).

Start Time: October 07, 2019 04:14 UTC
End Time: October 07, 2019 05:06 UTC

-SAP Cloud Platform Brazil (São Paulo) [neo-br1] – Service Advisory Since approximately 12:03 UTC until 13:11 UTC Audit Log retrieval API is not available

10/5/19

VMware Skyline Service Intermittent Availability issue User may not be able to access the service or may experience trouble while accessing the service.

Start Time: October 05, 2019 14:53 hours UTC
End Time: October 05, 2019 16:30 hours UTC

VMware Cloud on AWS: Unable to access the Network and Security tab Customers will not be able to view Network and Security tab.

Start Time: October 05, 2019 04:20 UTC
End Time: October 05, 2019 11:20 UTC

VMware Cloud on AWS Service Performance Degradation issue User may not be able to access the service or may experience trouble while accessing the service.

Start Time: October 05, 2019 01:00 UTC
End Time: October 05, 2019 01:34 UTC

10/4/19 VMware Skyline Service Intermittent Availability issue Some users may not be able to perform any operations on Skyline Advisor Dashboard.

Start Time: October 04, 2019 12:36 UTC
End Time: October 04, 2019 13:17 UTC

10/2/19 VMware Cloud on AWS: SDDC provisioning failures New SDDC provisioning or adding hosts/clusters may fail in US West (Oregon). This incident affected: VMware Cloud on AWS SDDC (US West (Oregon)).

Start Time: Tuesday, 1 October 2019, 22:28 UTC
End Time: Wednesday, 2 October 2019, 20:56 UTC