Tuesday 9th January 2018

I/O performance issue mitigation maintenance on us-3.magento.cloud region, scheduled 1 year ago

Identified Our infrastructure team believes they have identified a change which could mitigate the performance issues which have been affecting the us-3.magento.cloud region. This change will involve a major region-wide maintenance, during which most instances in the region will need to be worked on. The window is scheduled to last for two hours, during which time projects may incur brief periods of slow responsiveness as instances are evacuated to be worked on, but no downtime is expected.

Update as of 7:20AM PST Our infrastructure team successfully completed the maintenance in the region, but upon testing identified that the I/O performance issues had not yet been resolved. Further investigation from the team then identified that the network speeds in the region are underperforming, causing slow I/O on shared storage. Our infrastructure team is continuing to investigate the root cause of the underperforming speeds and will continue to provide updates as additional information becomes available.

Update 2018-01-18 10:52 PST Our infrastructure team has completed the root cause investigation and discovered an issue resulting in CRON tasks accumulating a backlog upon failure. This was compounded by a problem in our rebalancing algorithm. As such over-loaded hosts had more CRON jobs failing.. creating ever more load. The rebalancing algorithm has been corrected. We have also tweaked some system parameters to lower the incidence of CRON jobs failing.