Downtine in Cloud Computing
Cloud Outage simply refers to the duration when the cloud infrastructure service is unavailable for use. The unavailability may also refer to performance inadequacy of the service, as per the agreed SLA metrics. For instance, the incident during which an outage may have only partially impacted a data center may cause the vendor to perform the necessary maintenance and restoration measures. Until the service is fully restored as per the agreed SLA standards, it may be seen as a downtime for the end-user.
Causes of Downtime or Cloud Outage
- Power Outage
- Cyber Attacks and Security Breaches
- Human Errors
- Software and Technical Issues
- Networking Issues
- Maintenance
For more desriptive attacks we could look at,
- Traffic Overload DDoS Attacks
- DNS Failure
- Expired Domain
Repurcations of Downtime or Cloud Outage
- Lost of Sales Revenue
- Lost employee Productivity
- Corruption of and gaps in mission-critical data
- Damages to equipment and associated assets
- Cost of remediating systems and core business processes
- Damaged reputation with customers and key stakeholders
- Degradation of employee morale
- Regulatory, compliance, and legal penalties (including potential litigation fees)
- Loss of insurance discounts; Contract penalties
- Disruption of supply-chain
According to reports, In 2013, Forbes famously calculated the cost associated with the eTailer’s most critical outage that year. Based on Amazon’s 2012 net sales, it was determined that outage cost Amazon $66,240 per minute—or nearly $2 million. A previous outage in June 2008 was close to $31,000 per minute, based on the previous quarter’s global revenue of $4.13 billion. Amazon reported revenues of $107 billion in 2015, which comes out to $203,577 every minute in today’s numbers, or a $2,646,501 price tag for the 13 minute episode of downtime.
Last five major Cloud Outages
-
Microsoft Azure, March 2020
On March 3, 2020, Microsoft Azure services in Microsoft’s East U.S. data center region encountered more than six hours of outage. The outage affected a subset of East U.S. customers. The Redmond giant disclosed that a cooling system failure led to the outage, which impacted storage, compute, networking, and other services. -
IBM Cloud, June 2020
IBM Cloud suffered a multi-zone, four-hour interruption of services on June 10, 2020 that affected IBM cloud customers in Washington, D.C., Dallas, London, Frankfurt, and Sydney. The outage impacted general cloud services, Kubernetes services, App connect, and Watson AI cloud services. An investigation revealed that a third party network provider flooded the IBM Cloud network with incorrect routing, which impacted IBM Cloud services and 80+ data centers. -
Cloudflare, July 2020
A 27 minutes Cloudflare outage took down a significant chunk of internet services on July 17, 2020. The outage was due to a configuration error in Cloudflare’s global backbone network, which resulted in a 50% traffic drop across its network. The disruption impacted several big name clients such as Discord, Feedly, GitLab, League of Legends, Patreon, Politico, and Shopify. -
AWS, November 2020
Even though 2020 turned out to be a strong financial year for AWS, the cloud giant suffered a multi-hour, global outage on November 25, 2020 which sparked a wave of memes on Twitter. The interruption affected the U.S. East-1 region that knocked down services of prominent AWS customers, including 1Password, Adobe Spark, Autodesk, Flickr, iRobot, Roku, Twilio, The Washington Post, and Glassdoor. The interruption was triggered due to the small addition of capacity to Amazon Kinesis. Also, it affected other AWS services, such as Lambda, LEX, Macie, Managed Blockchain, Marketplace, MediaLive, MediaConvert, Personalize, Rekognition, SageMaker, and Workspaces. -
Google Cloud, December 2020
On December 14, 2020, Google Cloud experienced a widespread outage that interrupted services, including YouTube, Google Workspace, and Gmail. The 47 minutes outage was due to its automated storage quota management system that reduced the authentication system’s capacity and prevented users from accessing the services.