Downtime with Cloud Computing

Downtine in Cloud Computing
Cloud Outage simply refers to the duration when the cloud infrastructure service is unavailable for use. The unavailability may also refer to performance inadequacy of the service, as per the agreed SLA metrics. For instance, the incident during which an outage may have only partially impacted a data center may cause the vendor to perform the necessary maintenance and restoration measures. Until the service is fully restored as per the agreed SLA standards, it may be seen as a downtime for the end-user.

Causes of Downtime or Cloud Outage

Power Outage
Cyber Attacks and Security Breaches
Human Errors
Software and Technical Issues
Networking Issues
Maintenance

For more desriptive attacks we could look at,

Traffic Overload DDoS Attacks
DNS Failure
Expired Domain

Repurcations of Downtime or Cloud Outage

Lost of Sales Revenue
Lost employee Productivity
Corruption of and gaps in mission-critical data
Damages to equipment and associated assets
Cost of remediating systems and core business processes
Damaged reputation with customers and key stakeholders
Degradation of employee morale
Regulatory, compliance, and legal penalties (including potential litigation fees)
Loss of insurance discounts; Contract penalties
Disruption of supply-chain

According to reports, In 2013, Forbes famously calculated the cost associated with the eTailer’s most critical outage that year. Based on Amazon’s 2012 net sales, it was determined that outage cost Amazon $66, 240 p e r m i n u t e — o r n e a r l y$ 2 million. A previous outage in June 2008 was close to $31, 000 p e r m i n u t e, b a s e d o n t h e p r e v i o u s q u a r t e r^{'} s g l o b a l r e v e n u e o f$ 4.13 billion. Amazon reported revenues of $107 b i l l i o n i n 2015, w h i c h c o m e s o u t t o$ 203,577 every minute in today’s numbers, or a $2,646,501 price tag for the 13 minute episode of downtime.

Last five major Cloud Outages

Microsoft Azure, March 2020
On March 3, 2020, Microsoft Azure services in Microsoft’s East U.S. data center region encountered more than six hours of outage. The outage affected a subset of East U.S. customers. The Redmond giant disclosed that a cooling system failure led to the outage, which impacted storage, compute, networking, and other services.
IBM Cloud, June 2020
IBM Cloud suffered a multi-zone, four-hour interruption of services on June 10, 2020 that affected IBM cloud customers in Washington, D.C., Dallas, London, Frankfurt, and Sydney. The outage impacted general cloud services, Kubernetes services, App connect, and Watson AI cloud services. An investigation revealed that a third party network provider flooded the IBM Cloud network with incorrect routing, which impacted IBM Cloud services and 80+ data centers.
Cloudflare, July 2020
A 27 minutes Cloudflare outage took down a significant chunk of internet services on July 17, 2020. The outage was due to a configuration error in Cloudflare’s global backbone network, which resulted in a 50% traffic drop across its network. The disruption impacted several big name clients such as Discord, Feedly, GitLab, League of Legends, Patreon, Politico, and Shopify.
AWS, November 2020
Even though 2020 turned out to be a strong financial year for AWS, the cloud giant suffered a multi-hour, global outage on November 25, 2020 which sparked a wave of memes on Twitter. The interruption affected the U.S. East-1 region that knocked down services of prominent AWS customers, including 1Password, Adobe Spark, Autodesk, Flickr, iRobot, Roku, Twilio, The Washington Post, and Glassdoor. The interruption was triggered due to the small addition of capacity to Amazon Kinesis. Also, it affected other AWS services, such as Lambda, LEX, Macie, Managed Blockchain, Marketplace, MediaLive, MediaConvert, Personalize, Rekognition, SageMaker, and Workspaces.
Google Cloud, December 2020
On December 14, 2020, Google Cloud experienced a widespread outage that interrupted services, including YouTube, Google Workspace, and Gmail. The 47 minutes outage was due to its automated storage quota management system that reduced the authentication system’s capacity and prevented users from accessing the services.

Downtime with Cloud Computing

Further Reading

Community Cloud - Cloud Computing

Distributed Cloud - Cloud Computing

Multi Cloud - Cloud Computing