Technical Response to Facebook’s Open Compute Project data center plans

8

Earlier this month, social media giant Facebook unveiled details of its data center facilities designs and server hardware plans in what it is calling the Open Compute Project, posting technical specifications and CAD drawings of its custom server hardware and data center MEP components.

The process is groundbreaking, and Facebook should be lauded for its openness and generosity. In the Web-scale data center space, secrecy has been the norm. Facebook’s move will have significant ramifications for the entire data center community – especially if it inspires other highly-efficient data center operators to follow suit.

Facebook’s data center operations have faced unprecedented public scrutiny, with environmental organizations protesting the company’s choice of coal-powered electricity. Facebook’s IT operations touch so many of our lives, it is not surprising that its data centers are of major public interest.

Jonathan Heiliger, VP Technical Operations at Facebook, asked the data center community to weigh in on the designs in a recent video. “Give us feedback, tell us where we screwed up, tell us where we made a bad decision, and help us make it better.”

In the spirit of that request, Uptime Institute Professional Services engineers offer the following feedback:

Facebook’s cooling method water-wasteful in a desert community

“As an engineer from the water-starved west, this is near and dear to me,” said Keith Klesner, consultant with Uptime Institute Professional Services. “The climate in the area is a high desert with average annual precipitation of less than 10 inches. In a region where water is scarce, Facebook has designed the data Center with 100% evaporative free cooling. The local municipality sources all of its water from a shallow aquifer, most likely the same one which Facebook has sunk its wells.”
From Facebook: The direct evaporative system is supplied primarily by an on-site well and secondarily by the normal city water distribution system. Both sources feed into a storage tank. The storage tank provides 48 hours of water in the event well water and city water sources are unavailable.

“For a site considering sustainability and overall corporate social responsibility, my grade for the cooling choice is a D,” Klesner said. “The new Bend Broadband data center down the road in Bend, Oregon is a more sustainable model (using indirect air side economization) given the local climatology. This thread on Facebook’s own pages hits on my exact point. The City of Prineville is small and running out of water. Facebook is working with the City, but aquifers do not often recharge at the rate of extraction.”

“Phase 1 of the Data Center is 30 MW and Ph 2 is TBD. I think a starting consumption estimate could be 10,000 gallons per MW/day putting total water consumption of 300,000 gallons per day. That’s about 10% of the total city water, which will rise significantly on phase 2 of the project. The designer has the exact volume calculations, but the sourcing issue is the heart of the matter. The City of Pineville will run out of water from current sources somewhere from 2015-2017. Their solution will be to drill to a deeper aquifer which will likely be subject to overuse in the future,” Klesner said.

Facebook data centers vulnerable to downtime

“Wildfires, dust and volcano ash happen,” Klesner said. “In the case of extreme outdoor contaminants the data center will shut down.”

From Facebook: We acknowledge that this is a condition that can cause potential shutdown. We already have filtration installed and will run evaporative cooling at full capacity to reduce smoke and particulates in the event of a fire or contamination. Then, depending on intensity, we can utilize time for orderly shutdown, or else run for a prolonged period of time at minimum OA. We have a provision for a closed-loop system that uses indirect cooling.

The high desert east of the Cascades Mountains burns every summer. It’s only a matter of time before Facebook has to deal with this issue.

Facebook has said that the Uptime Tier Classification System does not apply to their Prineville data center. But, you would think the organization might be less cavalier about potentially disruptive vulnerabilities at the facility that supports the primary line of business.

In fact, the details of the Facebook data center design emphasize just how effective Tiers are at rating data center investment in term s of performance potential. Some of the facilities details reveal a fairly typical cost-focused rather than performance-minded data center design.

For example, Facebook’s backup generators are a potential vulnerability. “The document states the engine-generators are Standby rated,” Uptime Institute Professional Service consultant Christopher Brown pointed out. “This will impact the ability of the units to support the facility for long-term power outages as the Standby rating has yearly runtime limitations. The engine-generators typically are used for reliable power supply when performing UPS maintenance. Regular testing of the units, maintenance of other critical equipment may impact the units’ ability to support a long term power outage or long term failure of a UPS system.”

Lastly, much of the mechanical infrastructure does not lend itself to Concurrent Maintainability. “The large bus duct (1000 amps and above) are generally constructed with bolt together sections and thus allow for maintaining of the bus sections. But smaller bus duct to deliver power to the servers does not typically utilize bolt together sections and instead uses press fit connections. These connections are not maintainable and thus create an operational problem long term,” Brown said.

On the facilities side, inconsistent maintenance opportunities on select and constrained performance potential in their engine generators yields an overall Tier II rating. These are fundamental constraints that will impact long-term operations. It is important to go to the heart of the Tiers: business case.

The key takeaways from this analysis:
-Working backwards from the facilities design, Facebook’s IT operations at its Prineville, OR data center may be core to its business, but the company is willing to tolerate downtime.
-While Facebook’s Prineville data center is energy efficient, it has a long way to go to call itself green.

“The term ‘green’ cannot just be about reducing electrical power consumption. It has to involve the natural resource limitations of the local area. Green must be centered on designing data centers that minimize the consumption of all natural resources not just one,” Brown said. “Any green approach should be designed to minimize energy consumption while not increasing strain on other vital resources. Otherwise we trade one problem for another”

Continue the dialogue at Uptime Symposium
Facebook’s data center operations team will give the keynote address Wednesday May 11th at Uptime Institute Symposium, with a presentation: Facebook’s Latest Innovations in Data Center Design, featuring Facebook’s Jay Park, Director, Data Center Design Engineering and Facilities Operations, Thomas Furlong, Director of Site Operations, and Daniel Lee, Data Center Mechanical Engineer.


Posted by mstansberry on 18-04-2011
Categories: Uncategorized
Tags: , ,
 

Comments (8)

Hi, I’m Tom Furlong, director of site operations at Facebook — thanks for the feedback on the Open Compute Project and our new data center in Prineville. I wanted to provide some more context and information on water consumption at the site. We have worked hard to make the Prineville data center as water efficient as possible and we are working with the city and their engineers to ensure that our potential impact on any city water use is sustainable. In our initial tests, we’ve calculated that the data center has a water use effectiveness (WUE) ratio of 0.3 L/kWh, calculated over a year. While WUE is a relatively new measure, we believe we are significantly lower than the industry average of around 1.0 L/kWh. Our calculations show the we will be using an amount of water per day that is multiples lower than the number quoted in the article above. We are also committed to recycling as much water as possible. For example, the mist eliminator in our evaporative cooling system captures water droplets that have not been fully absorbed into the air, representing about 18% of water usage. That water is then recycled back to the misting system. We’ll continue to work with the city of Prineville to ensure that we are good neighbors and we will continue to be mindful of the scarcity of water in our environment.

Before finalizing judgement on the water use, you might take into account the amount of water used to generate power. I would expect that water consumed to generate additional power due to a less efficient cooling systems would offset water saved by the less efficient cooling. I couldn’t predict whether one is greater than the other, but I would recommend that you include it in your analysis.

The UTI comments about water usage are, of course, correct but may not be fully fleshed out. The Facebook PUE is 1.07 and it uses a lot of water locally. The delta between that 1.07 PUE and a PUE of, say, 1.5, is also reflected in the water consumption at the power station(s) feeding the grid. In fact, having checked the typical water consumption at US stations, the Facebook facility uses marginally less water than a non-adiabatic PUE 1.5 facility would in the same area and a lot less than a more typical PUE 2.0 facility. Indeed that is also pertinent at partial loads. So the point comes down to LOCAL consumption vs OVERALL consumption. Facebook use less power and less water overall.

Here’s a solution to the water use issue, use reclaimed water in a cooling tower, use the cooling tower water in CRAC units, Rear door HXs or in direct cooling to the chip.

You might able to use the reclaimed water in the direct evaporative system , but there may be issues with pathogens.

Steve

The following link shows that coal power uses about 2.2 liters per kWh of power generated. (Nuclear and concentrate solar both use 3.3 liters per kWh.)

I don’t have all the numbers but I’m sure that crunching them would show that the evaporative cooling system uses considerably less water overall as compared to chillers.

One has to be very careful with these kinds of calculations. For example, I once calculated that an electric car emits more carbon than a reasonably efficient gasoline powered car when one takes into account the average carbon emissions for generation in the U.S. and transmission and charging losses.

http://www.thegreengrid.org/en/Global/Content/white-papers/WUE

Thank you for some excellent comments on our feedback to Facebook. The Institute used a conservative estimate of local water consumption but only the owner Facebook has the real numbers. To continue the spirit of the OpenCompute Project we ask Facebook to publicly disclose average and maximum daily water use, the permit for use of Prineville municipal water and a report aquifer impact.
As for the Data Center’s overall water usage efficiency the Green Grid’s “WUE source” accounts for power generation in Oregon at 3.1 L per kWh. This value is several times higher than in Washington state and California. Even using 38% less power than the average data center, the Prineville Data Center uses 3 times more water than a comparable data center in California (using WUE source).

Lost in much of this discussion is the availability and resiliency of the Data Center. Most businesses cannot accept the downtime afforded by the Prineville Data Center design. Facebook obviously can accept outages due to local forest fires, dust storms or longterm utility outages in which their Standby Engine-Generators cannot support their Data Center. Most owners do not have that luxury.

Finally, I think the moral of the story is creating a very large Green Data Center is difficult. Green requires energy efficiency, resource availability, consideration of environmental and carbon impact, and local community involvement. Facebook has done a great job with some of these issues but as environmental groups and the Uptime Institute have pointed out, a Green Data Center, especially a resilient one is hard to achieve.

I would add that a Green Data Center approach also requires careful examination of the long-term management of the energy-consuming assets sitting on the floor. Designing an energy efficient infrastructure and populating it with low consumption servers et al is great. But managing those assets long-term is also very important in reducing overall energy consumption. It’s an ongoing process.

It is not without fully understanding how Facebook’s application(s) are designed, that one can make conclusions about what is and isn’t acceptable levels of downtime and redundancy for their data centers. If their applications have been designed from the ground up with failure in mind, it may be unnecessary and cost prohibitive to design additional redundancy into a single data center than handle failure scenarios at an application level between data centers.

I’m confident Facebook has taken this into account in their decisions around design and what are the optimal levels of investment to make into their data centers from a redundancy perspective.

Write a comment