The system failure experienced by Alibaba Cloud, the cloud computing subsidiary of Alibaba Group Holding, between Sunday and Monday in Hong Kong – with disruptions at some user sites stretching to more than 24 hours – has raised the alarm on internet infrastructure risks in the city and in neighbouring Macau.
Such a prolonged breakdown of cloud services could cause losses at the businesses of clients, according to Audrey Jiang, chief analyst at the research centre of software development information provider InfoQ.
The incident, which happened at the internet data centre of Alibaba Cloud’s Hong Kong partner PCCW, resulted in the suspension of withdrawals at major cryptocurrency exchange OKX and disabled the website of the Monetary Authority of Macau. OKX, one of the world’s biggest cryptocurrency exchanges by trading volume, first reported the problem at 11am on Sunday and announced the resumption of operations at noon on Monday.
Other businesses that had their websites and apps affected by the system failure include Galaxy Macau Resort, Lotus TV Macau and food delivery platform mFood, according to a post by Macau’s Judiciary Police on microblogging service Weibo. These operations had all resumed by Tuesday morning.
In a statement posted on its Hong Kong website on Monday, Alibaba Cloud pledged to “make compensation” based on its product or service agreements with the relevant customers. Parent Alibaba owns the South China Morning Post.
The company also assured that all of its online products “are gradually getting back to
normal operation” after the refrigeration equipment at PCCW’s data centre, where an anomaly occurred causing the system failure, was restored.
PCCW said in a statement on Tuesday that the equipment in question “is not owned or operated by PCCW” and PCCW was not involved with the outage.
The outage affected the use of various Alibaba Cloud offerings, including its elastic compute service, database, storage and cloud network products at the company’s “availability zone C” covering Hong Kong and Macau.
“It is very rare for a cloud provider’s services to be suspended for 12 hours, or even 24 hours … The impact is huge,” said Zhang Yi, chief executive of research firm iiMedia.
Alibaba Cloud’s system failure reflects how internet infrastructure remains vulnerable to the occasional breakdown even as a growing number of companies in the industrial, manufacturing, financial services, utilities and other sectors shift their information technology workloads to the cloud.
“These events highlight the risk that no cloud service has a 100 per cent guaranteed uptime, and organisations should plan for them given the disruption caused,” said Matthew Ball, chief analyst for cloud, cybersecurity and infrastructure at research firm Canalys. “This is also true if organisations were to use their private data centres.”
Cloud computing services enable companies to buy, sell, lease or distribute a range of software and other digital resources as an on-demand service over the internet, just like electricity from a power grid. These resources are managed inside data centres.
“From a technology perspective, it’s a false proposition [for cloud services] to achieve zero accidents,” InfoQ’s Jiang said. “What matters is how the service provider initiates emergency measures and response mechanisms.”
This was not the first time that Alibaba Cloud experienced an outage. In March 2019, a breakdown at one of the company’s facilities in northern China stretched for about six hours. Another incident in June 2018 caused certain websites and apps to go offline for about an hour.
Other cloud service providers have experienced similar incidents. Amazon Web
Services, the world’s largest cloud services company, was hit by a six-hour outage last September in Tokyo, where the online operations of brokerages, banks and airlines were disrupted. In 2017, Microsoft Corp customers using its Azure public cloud service experienced an eight-hour outage.
“[These incidents are] often caused by a systems (hardware) or power failure, software update issues, or a major event such as fire, or human error,” Canalys’ Ball said. “When these incidents happen, cloud service providers will investigate, learn and update their processes to build greater resilience so as not to repeat them.”
He added that when such incidents become a regular occurrence, it would have “a major impact on the cloud service provider’s competitiveness and reputation”.
Alibaba Cloud is recognised as Asia’s largest infrastructure-as-a-service provider by revenue, and the third-largest worldwide for four consecutive years through to 2021, according to tech market research firm Gartner.
As part of Alibaba, the company is considered a major growth driver for the e-commerce giant, which reported a profit of 1.1 billion yuan (US$164 million) from its cloud business in the 12 months ended March 31 – the subsidiary’s first profit since 2009.
Still, the cloud unit has seen its revenue growth slow down in recent months. Its sales rose by just 4 per cent in the September quarter, down from the 12 per cent and 10 per cent growth recorded in the June and March quarters, respectively.