Surviving data center lightning strikes

Posted by mstansberry | Posted in Cloud Computing, Data center availability, Data center operations, Uptime Institute Professional Services | Posted on 09-08-2011

Tags: , ,

0

Data Center Knowledge reported this week that lightning knocked out major cloud computing data centers in the Dublin area.

Chris Brown, Uptime Institute Professional Services consultant offered some advice on how to protect your company’s data center from lightning strikes:

A near lightning strike is very difficult to protect against. Many items designed for protection against lightning strikes don’t handle large near strikes well. So it is important to have that in mind when designing a system.

Good lightning protection and a good facility grounding system are necessities. But because lighting protection and surge suppression are not foolproof, it’s important to design the system itself to provide some resiliency.

The story suggested multiple engine-generators sharing a single bus requiring the units to be synchronized. In such situations, it is typical to have engine-generator paralleling switchgear isolated from the incoming utility feeder. The engine-generator paralleling switchgear is used to synchronize the generators then connect that power to the individual substations. In this arrangement, the switchgear, bus, and controls are separated from the incoming utility. The controls are powered either via a UPS or station batteries.

Either way a lightning strike would typically destroy a double conversion on-line UPS or rectifier of a station battery system, instead of allowing energy to propagate through to the controls, providing protection.

Typically the only connection between the substation and the paralleling switchgear is control wiring (phase signals for closed transition switching and dry contacts for utility loss and circuit breaker positioning). With the isolation, the generators are allowed to synchronize to the bus and connect to the substations via closed transitions.

But if the controls in the substations were destroyed it is possible the system would not automatically connect the available generator power to the substations. This would require trained personnel to manually open the utility circuit breaker and close the generator breaker. But not nearly as difficult or hazardous as manually paralleling engine-generators on the same bus. So it is important to have trained personnel on site 24×7 for mission critical facilities.

In summary, near lightning strikes are difficult to protect against. So a full approach of grounding, lightning protection systems, surge and transient protection, system topology, and personnel are key to helping ensure your data center survives a near lightning strike.

Wildfire shuts down Los Alamos data center in record fire year

Posted by mstansberry | Posted in Data center availability, Data center media | Posted on 30-06-2011

Tags: , ,

0

Computerworld reports that the Los Alamos National Lab has shut down two of its largest supercomputers, as wildfires continue to burn near this sprawling New Mexico facility. According to the National Climate Data Center, there were 6,625 fires which burned approximately 1.1 million acres in May — the most acres burned during the month of May on record.

Terry Altom, Uptime Institute Professional Services consultant said smoke from wildfires could impact a data center. The filtration system may not remove all of the particulates. Therefore, smoke will reach the interior of the data center. Data center cooling units typically have smoke detectors within them to shut them down. If smoke has infiltrated the building, the smoke detectors will shut the units down.

Computerworld quoted Uptime Institute’s Vince Renaud, “Once the cooling systems are shut down you are kind of done.” Meaning that once cooling systems turn off, the IT equipment will hit their temperature thresholds rather quickly. At that point, systems have to be shut down, he said.

For buildings with VESDA systems, the problem could be worse because VESDA systems are more sensitive.

Data center downtime causes and consequences

Posted by mstansberry | Posted in Data center availability, Data center media | Posted on 16-06-2011

Tags: , ,

0

An excellent article from SearchDataCenter.com’s data center advisory board, The causes and costs of data center system downtime, featured Uptime Institute VP, Rick Schuknecht.

Schuknecht leads Uptime Institute’s elite data center end user network. From the article:

Schuknecht said 73% of data center downtime is caused by human error. Human error includes poor training, poor maintenance practices and poor operational governance. He said an outage can be very stressful and damaging to morale, because jobs and compensation are often based on an organization’s availability goals.

Schuknecht also said that if an organization has a good investigation protocol in place, they can determine the root cause of the outage and identify steps to take in the short and long term. But that only works if you have an effective protocol in place.

There are some overlooked repercussions to an outage. For example, there is a regulatory penalty in financial industries. An outage can also erode a company’s competitive edge, like loss of business reputation within the industry and/or customer base. Where would you rather put your money? In the bank with no downtime or the one with repeated downtime? Most financial companies have processes in place to preserve or recover data; it’s the loss of transactional continuity that can cause the biggest problems.

What can data center staff do to avoid and mitigate system downtime? Schuknecht recommends establishing a good facilities and computing maintenance program for each piece of equipment, creating a staff training program that describes how and when to respond to downtime events, provide adequate funding levels for operating expenses to make sure everything works properly and institute a good governance program where site infrastructure is operated in accordance with manufacturer expectations.

Avoiding battery fires in the data center

Posted by mstansberry | Posted in Data center availability, Data center colocation | Posted on 02-05-2011

Tags: , , ,

1

Data Center Dynamics reported last week, Aruba, Italy’s largest web hosting data center, went down due to a fire in the UPS room involving the batteries.

“Battery fires happen,” said Uptime Institute Professional Services Consultant, Chris Brown. “Once the fire starts the battery can feed the fire until it exhausts its energy. More information on this particular incident would be needed to know if this was or was not an avoidable situation (i.e. if it was the result of a thermal runaway or some other issue). But thermal runaway would be the biggest concern I would have for a cause of a battery fire.”

There are ways to help avoid thermal runaway, according to Brown. Those would include but not be limited to keeping the batteries and charging means (UPS) in good condition and repaired, a battery monitoring system that monitors the cell temperature of each battery, and temperature compensated charging.

“Basically the best way to avoid battery issues is to stay on top of the preventative maintenance of the batteries and charging means. Regular preventative maintenance can spot problematic batteries or cells before they fail internally that can lead to a thermal runaway as well as allow technicians to adjust charging voltage and current to ensure the batteries are not overcharged,” Brown said. “Batteries are combustible there is always a risk of fire from batteries. And that risk should drive where batteries are placed and the type of fire extinguishing means used for the room.”

Terral Altom, Uptime Institute Professional Services consultant said batteries can and often do build up heat and hydrogen, and under the right circumstances, a fire can erupt. “The trouble with wet cell batteries is that they have plastic jars, and these jars are highly combustible.”

Some insurance underwriters require a sprinkler system in battery rooms. In an anecdotal account told to Altom, a flooded cell battery room caught fire, and the gaseous suppression system discharged. The smoldering battery jars reignited after the suppressant gas dissipated. The data center team then had to call the fire department, and by the time they got there, it took 45 minutes to extinguish the fire. Due to smoke and water damage, much more than the battery rooms were damaged.

Does your organization make enough time for strategic data center planning?

Posted by mstansberry | Posted in Data center availability, Data center operations | Posted on 28-04-2011

Tags: , , , ,

0

SearchDataCenter.com just ran an article featuring its Data Center Advisory board, featuring data center thought leaders and advice from the trenches about setting aside time for strategic planning.

From Uptime Institute’s contribution to the article: So many managers are locked into the day-to-day firefighting; they never get out of the reactionary mode to plan ahead. The consequences can be dire. I wouldn’t want to be the manager who has to go to the executive team to explain why a data center ran out of capacity sooner than expected. But even if you’re not at risk of running out of capacity — just the day-to-day waste that comes from not aligning data center and business needs can really eat at your company’s bottom line.

Is your organization at risk of running out of data center capacity? Are you considering moving compute loads to the cloud, increasing virtualization investments, evaluating colocation options or planning a new data center? Are you responsible to deliver this message and formulate a strategic plan? These aren’t the kinds of projects you can handle if you don’t make time for formal strategic planning across all of the silos in the organization, from server and storage management to the facilities team.

Uptime Institute Professional Services started engaging clients recently to sit these different parties down in the same room to create a Digital Infrastructure Roadmap, which is a way to optimize your IT cap-ex and op-ex investment with input from all of the stakeholders in the data center.