Burning up: Mitigation should sustain datacentre operations during extreme heat
Tuesday 19 July 2022 was the hottest day ever recorded in the UK, peaking at an unprecedented 40.3ºC in Coningsby, Lincolnshire. Reports of datacentre cooling system failures at Google and Oracle swiftly followed.
But while hot weather exacerbates operational challenges, datacentres worldwide have been designed for locations that regularly experience much more extreme conditions, as Simon Brady, Europe, Middle East and Africa (EMEA) services channel business development head at Vertiv and a datacentre optimisation specialist, points out.
“Think about Asia, Australia, the Middle East, even places in Eastern Europe where you get -20ºC in winter and 35ºC to 40ºC in summer,” says Brady. “Sites that fail were either designed or maintained poorly.”
Even outside a specified temperature window, datacentres “tend not to fail” as such, although they may break their service level agreements (SLAs) and operate outside them for that period, he explains.
Managing expectations around availability
SLAs should better account for the increased chance of outlier temperatures, yet operators have been wary of broaching this topic with their customers, often when it comes to agreeing on maintenance levels and related yet necessary downtime, he says.
“Although, with adiabatic or evaporative cooling, we can then have a conversation about the lack of water,” adds Brady. “A project I did two years ago in Saudi Arabia planned for 55ºC external temperatures. You have to put a bit more engineering around it, but it’s no problem whatsoever.”
If the UK starts hitting those temperatures, “we’ve got bigger issues than whether datacentres can cope”, he adds.
Tate Cantrell, Verne Global
Simon Bennett, EMEA chief technology officer (CTO) at Rackspace, emphasises that failures are typically down to multiple factors. It follows that unexpected heat spikes should be manageable – unless the datacentre operation in question is already running out of headroom on capacity that would allow for outlier events.
It is likely to be more about making incremental changes on current sites than pouring new concrete, which few operators are doing today anyway, he says. Monitor the temperatures, tidy the cabling, close rack doors to maximise airflow, and so on.
“There are really simple things that people don’t do in poorly run datacentres,” says Bennett. “Ultimately, you need to make sure you know what sort of fault tolerance you’ve got with hotter days. Can you cope with the loss of a main air-conditioning unit, for example?”
A bigger issue is managing heat generated internally by the datacentre itself, which means attention to monitoring and improving airflows and other basics, rather than implementing a new solution.
“You’re generating a lot more heat in a smaller footprint,” Bennett notes. “You have to offset that our electricity costs have gone through the flippin’ roof. People have to review their air-con capability properly and revise it up to 40ºC anyway.”
When densities are high, liquid cooling technologies can make sense if they help reduce electricity costs too. Modelling and digital twins can help at scale, he adds.
Moving workloads somewhere with lower temperatures, like Scotland, and which might offer “consistent cost of electricity”, like Iceland, might also make sense, especially to procurement departments looking to nail down a few variables for longer-term plans, Bennett suggests.
Tate Cantrell, CTO at Iceland operator Verne Global, points out that cooler-climate datacentres can use less energy cooling higher-density workloads. The Icelandic summer average is just 13ºC and datacentres can use 100% hydro and geothermal power.
“More workloads could be moved to more sustainable locations,” says Cantrell. “Metropolitan datacentres, such as near London, could focus on supporting latency-sensitive applications.”
Pinpoint remaining inefficiencies for elimination
Raymond Ma, general manager for Europe, Australia and New Zealand at Alibaba Cloud Intelligence, maintains that conventional, older datacentres can be “incredibly inefficient”. Standard air-conditioning can eat up 40% of the total energy bill, resulting in a huge environmental impact.
Innovation remains important, including looking at advanced immersion or water cooling technologies, with redesigns of conventional centres requiring planning around reconstruction of the infrastructure, preparation of specially designed IT, coolant selection, and mechanisation of monitoring and maintenance systems.
“This can deliver costless cooling for 90% of a datacentre’s operating time, driving down energy consumption by more than 80% compared to mechanical cooling,” says Ma. “Adopt industry best practices, such as using intelligent algorithms to increase energy efficiency, enhancing renewable electricity use and boosting the recycling of energy such as waste heat generated by servers.”
Richard Clifford, solutions head at UK-based Keysource, warns there is no silver bullet for better management of soaring temperatures. ldquo;The key is the design and strategy for dealing with higher ambient temperatures, focusing on cooling and IT load, which means reducing the load in line with reduced cooling performance,” he says. “Cooler climates have several benefits, but climate change is global.”
He says a “vast majority” of UK datacentres already operate below 80% of design capacity. They should be able to manage these rising temperatures currently, with appropriate attention paid to proper maintenance, resilience, airflow and capacity planning around resources, including power and, increasingly, water.
James Petter, vice-president of international sales at Pure Storage, prescribes solutions that can flex up and down, based on data-driven insights. He broadly agrees that capacity planning will likely benefit from more innovation and sustainable tech purchasing, because every item uses energy, even if it doesn’t produce any heat while operating.
For instance, solar may power more datacentres in future, with variable power draws backed up by an uninterruptible power supply (UPS). This could mean greater ability to cope with volatile weather patterns exacerbated by an overall hotter climate.
Coupled with data reduction and compression, racks of the future should consume less power and require less cooling. “I think it’s going to go to nanotechnology,” says Petter.
Steve Wright, chief operating officer at UK-based 4D Data Centres, reiterates that the outages and facility impacts seen this summer have been a tiny subset of the UK’s 400 or so commercial datacentres – and at least one outage appears to have involved an accidental rerouting.
“Those impacts probably shouldn’t have happened. However, the reality is that people need to learn that cloud isn’t this magical thing that is just always there 24/7,” he says. “You need a strategy for how you deal with not being able to access your internal IT asset or online service.”
For that reason, he looks forward to deeper investigation of what happened with the latest hot weather issues. Datacentre operators should keep asking what can be learnt as a separate, independent entity targeting greater stability and resilience – with skillsets also relevant.
Wright underlines the sense of trying to learn from and benefit from the experiences of datacentre operators already working in much hotter or humid environments. Improvements around maintenance are still needed, and a lot of retrofitting and mitigation is still possible.
“Maybe 10 to 15 years ago, you could fit a few chillers in between the buildings and get away with the cooling infrastructure being in the car park,” says Wright.
It can be as simple as just putting less load in the area that needs to run cooler, he adds.
“Probably the biggest thing is the elements of the ambient air design condition. I suspect we’ll start working to 43ºC to 44ºC, maybe 45ºC, in future,” he says.
“The ASHRAE [American Society of Heating, Refrigerating and Air-Conditioning Engineers] operating window for server equipment has got wide over 15 years or so, but we’re still working on pushing that even higher. Actually, we can cool down to 25ºC (rather than 16-18ºC), which still feels quite warm in the UK but you can still push that higher.”
Plan, mitigate and maintain
Daniel Bizo, research director at Uptime Institute Intelligence, confirms that datacentre operators should start with looking at overall capacity, restrictions and other limitations, where mitigation is possible, aligned as usual with the business case.
“Everyone’s situation is different – the vast majority of datacentre sites are unique, and you also need to be aware of localised risks,” he notes.
Daniel Bizo, Uptime Institute Intelligence
Those caught out may trace issues to “lurking equipment failures” rather than high temperatures per se. Theoretically, a datacentre design could be virtually bullet-proof against climate change, including atmospheric risk and extreme weather, from storms and wind to hail, rain and floods.
“Maybe you can move north, or close to a big body of water – the sea or a big river or lake,” says Bizo. “That said, there can be flow issues and restrictions, as we’ve seen in France. Some are now struggling with nuclear reactor cooling… because the rivers are getting so hot that it’s affecting aquatic life.”
With legacy systems, however, it might take 10-15 years before optimum architectures can be implemented. After all, the industry comes with at least 20 years of “baggage” to deal with.
The rule of thumb for refurbishments has been to consider peaks over the past 20 years, but with increased volatility, operators should perhaps look back further or at least build in more of a buffer to account for never-seen-before temperatures.
Either way, operators probably need to start thinking about their climate resilience strategies in a “more creative” manner.
“We cannot predict exactly,” notes Bizo. “The problem with climate change is that we know it is getting worse and weather events are becoming more extreme, but we don’t know where exactly and by how much. Even if we had perfect data and infinite compute, it wouldn’t be possible to pull that off.”