Saturday, November 21, 2009

Best Practices in Data Center Cooling

The proliferation of modern, high-powered IT equipment is creating a new set of cooling challenges in the data center that lead to reduced equipment resilience well before the cooling capacity of the room is reached. This is forcing data center operators to take a conservative approach to data center cooling and pay more than is necessary to operate the cooling systems. As shown in Figure 1, over the past decade, the cost of power and cooling has increased 400%, and these costs are expected to continue to rise. To make matters worse, there is still a need to deploy more servers to support new business solutions, faster data access has resulted in tenfold increase of computer servers in today’s data centers. In addition to the increase of the number of servers, the size of these servers has decreased allowing more equipment to be packed into server cabinets. The latest blade server design incorporates a high number of high power servers into a single 2 or 3 U enclosure.

Figure 1: IDC Report – Cost Structure and Trends

When server racks are filled with these high density servers the results are an increased power load and resulting heat load that is now causing serious problems for today’s data centers. The proliferation of these high-density racks in very dense configurations increases the complexity of providing adequate power and cooling in the data center. The resulting power consumption and heat output of these high density racks approaches 10-15 kW per rack far exceeding the 1-2 kW per rack of just 10 years ago.
If these trends continue, the ability of data centers to deploy new services will be severely constrained. To overcome this constraint, data center operators have three choices: expand power and cooling capacity, build new data centers, or employ an efficient solution that maximizes the usage of existing cooling capacity. This takes us back to our basic economics lessons – “Maximum Utilisation of Available Resources”. This increased power consumption in data centers is the obvious cause of a byproduct - heat output generated by these high density servers, which in turn create “hot spots”. Dealing with hot-spots is the data center manager’s nightmare. Thus, designing the air-conditioning system in a data centre using a simple energy balance method has become increasingly challenging; somewhat reliant on personal skill and may not meet the demand in the near future. The Solution however is quite at hand and achievable.

Hot Aisles and Cold Aisles

This practice was first implemented by Dr Robert (Doctor Bob) Sullivan. His concept was to arrange equipment in a front to front and rear to rear configuration to create hot aisles and cold isles. This configuration has become the industry standard practice to supply cold air needed to equipment and manage the heat output efficiently. This configuration will optimize cooling effectiveness and efficiency by ensuring that only cold supply air is delivered to equipment inlets.

Source: The Uptime Institute

Avoid Gaps in Rows

Air is a fluid and fluids tend to hug the outside surface of a server cabinet. Openings in equipment rows and at the ends of rows allow air from the hot aisles to wrap around the end cabinets infiltrating the cold aisles. The mixing of hot and cold air raises the inlet temperature for devices. If this recycling of warm air perpetuates the device may overheat. Long continuous rows of server cabinets allow better cooling flow of both hot exhaust air and cold supply air.


Illustration showing how hot exhaust air mixes with cold air when gaps between cabinets are present
Cabinet Ceiling Fans

With the evolution of many different types of cabinet designs, it is common to have cabinets installed with ceiling fans these days. It is however advised not to configure more than 3 cabinets with ceiling fans adjacent to each other. Cabinet ceiling fans serve no useful purpose when full depth devices are installed at the top of the cabinet. They disrupt the natural flow of hot exhaust air.

Cabinet Alignment

Locate computer cabinets in a manner where the front of the cabinet is aligned with raised floor seam. In order for the Cold aisles to be effective they must be at least 4 feet wide or in simpler terms equal to two floor panels wide. See below for Illustration showing the minimum recommended row spacing for effective cooling.

Source: ANSI/TIA-942
Cabinet Blanking Panels

The compaction of today’s high density equipment has forced the use of small diameter fans with reduced velocity. Rear cabinet doors restrict airflow, thus exhaust is circulated around the inside of the cabinet. The exhaust air finds a path of least resistance moving toward the front of the cabinets when cabinets are not fully populated with devices. The exhaust is drawn into the inlets eventually over heating devices. Installing blanking panels in the front of the cabinet, blocks this exhaust migration.
In most situations, cabinets that are not fully populated and without blanking panels will allow air from the hot aisle to pass through the cabinet and infiltrate the cold aisles. Again, the installation of blanking panels will eliminate this problem.


Row Orientation to Precision Air-conditioning Units

Orient computer equipment rows perpendicular to front face of computer-grade air-conditioners. This important as this allows hot exhaust air to return to the CRAC unit along the row or tunnel created by the wall of cabinets. Further when multiple equipment rows are perpendicular to the front face of the air-conditioners, the majority of exhaust air will travel along either end of the aisle working its way back to the nearest air-conditioner. Some air will move over the tops of equipment rows; however the impact to the cold aisles is much less. Ideally, hot aisles should be immediately perpendicular to air-conditioning unit face. See below for Illustration showing the recommended row orientation to Precision Air-conditioning Units.



Source: ANSI/TIA-942

Cable Management

Implement methodical cable routing and dress at rear of computer cabinets. This requires cables to be neatly dressed away from exhaust outlets to ensure unrestricted airflow. Also it has been often noticed that on pulling an under floor tile you would find waterfalls of cables just in front of the airflow from the CRAC. These should be neatly stacked away in cable trays and better still overhead cabling should be practiced.


Ceiling Clearance

Provide a minimum of 18 inches clearance above all computer cabinets. Industry recommends 36 inches for optimum cooling. Assuming the use of computer-grade down flow air-conditioners in standard configuration, the return air is intended to move along the data center ceiling returning to the top inlet of the air-conditioner. For this method of returning air to be effective, sufficient clearance is necessary above cabinets.


Locating Airflow Panels

Raised floor airflow panels should only be installed in “cold aisles” and must be accessible at all times. Use only the quantity required to maintain maximum static pressure. A general rule of thumb is 1 panel per sensible ton of cooling capacity. Another important thing about airflow panels is looking at the damper thickness. Unnecessarily thick dampers block airflow and defeat the purpose for which these airflow panels are designed. See below for Illustration showing the types of air flow panels.

Using a mix of both types of tiles could be a good idea for serving different cooling requirements for high density requirements of today.


Seal Cable Cutouts

Keep cable cutouts in raised floor panels as small as possible and seal after cable installation. Unsealed cable cutouts are a tremendous source of static pressure loss. These openings deliver cold air where it is not needed. The result is insufficient airflow volume at the airflow panels where it is truly needed. Sealing cable cutouts can increase the static pressure by nearly 25%. There are various types of floor grommets available today for new installations where the floor panel would be cut, grommet installed, and cables routed through the floor also grommets are designed for existing installations. This type of grommet is a two-piece assembly that is separated, wrapped around existing cables, joined together, and sealed to the raised floor. See below for Illustration showing the types of flow grommets.


Return Air Travel Distance

Keep return airflow distance to less than 1.33 feet per sensible ton of cooling capacity.



High-Density / Low-Density Areas

For deployment of self-contained high density zones within an existing or new low density data center, consider under floor barrier enclosing A/C and equipment. Because 150w/sq.ft. Is approx. only 3-4 kW per cabinet and today’s cabinets are much higher but not likely to fill the room. Flexibility is nice but fitting up entire room for limited high density equipment is not practical. The independence of these high density zones allows for predictable and reliable operation of high density equipment without a negative impact on the performance of existing low density power and cooling infrastructure. A side benefit is that these high density zones operate at much higher electrical efficiency than conventional designs. See below for Illustration showing a high density zone in a low density room.



Ceiling Plenum Return

If large ceiling plenum space which is fairly open is available, consider converting it to return air plenum. However this space must be twice that of the raised floor depth. The ceiling plenum can be used as flexibly as the raised floor to remove heat directly from the hot aisles before it can migrate. Use egg crate panels above the hot aisles and install ductwork on top of the air-conditioner to poke through the ceiling.
Locating Air-Conditioners

Rooms wider than 60 feet in either direction with air-conditioners along the periphery only, will require air-conditioners located along the center of the room. A/C units have defined area of operation. Largest computer-grade a/c units have max effective area approximating 35 feet from face of unit.

Supplemental Cooling

When load densities exceed 150 w/sq.ft. for >200 sq.ft. or 4kW per cabinet, supplemental cooling should be considered. Computer-grade down flow air-conditioning units available today can effectively cool load densities of 150 w/sq.ft. Above 150 w/sq.ft. the process of long-distance, under floor distribution to cabinet inlets, requires localized cooling.
Manufacturers have responded with products designed specifically for high density loads. Some attach to rear of cabs to improve local ventilation and others mount overhead to supply air to multiple cabinets.

Under Floor Baffle system

Baffle systems are a passive and contributory holistic solution that can be easily installed as an effective vertical under floor partitioning system, to direct air flow within the plenum space. These baffles direct the source of the cold air from the CRAC units to where the air is or is not needed. For rooms not yet populated it is essential not to waste the cold air, by cooling areas that are not yet populated.
Velocity is the time rate of motion, therefore velocity pressure is the pressure caused by air in motion. When air from a CRAC unit is forced through a partitioned air flow space, static pressure is created. Without dedicated partitioning, as the air moves further away from a CRAC unit, the air velocity decreases. To maintain velocity pressure to particular 'hot zones', baffles help maintain the static pressure further away from CRAC units and are a simple solution to cool thermal hot spots in information technology equipment centers. The ideal objective should be to create un-obstructed dedicated air flow paths to the equipment. Open floor penetrations must also be sealed to manage air flow more effectively. See below for Illustration showing deployment of baffle systems.



Source: plenaform.com

Partitioning off raised floor around command control centers, etc, improves operator comfort.


Acknowledgements: Endless discussions with my dear friend David

Share

Saturday, February 14, 2009

Hardware Failure? Look Beyond Power & Cooling

I have had many Data Center Managers in India and across Asia including Malaysia, Indonesia, Philippines, etc complain that they experienced sudden equipment failure, motherboard crashes and more so multiple hard disk failures, worse off they had no visible clue as to why the failures occurred.
Potentially Hard Disk Manufacturers, equipment suppliers and Data Center Managers tend to blame it on Electrostatic discharge (ESD), based on the fact that with increasing recording density and shrinking size of giant magnetoresistive (GMR) read/write heads, the GMR sensor is getting more sensitive to electrostatic discharge (ESD) events.
Some kinds of ESD damages will cause soft magnetic degradations of head performance with a progressive nature. We have reports of head degradations by ESD damages as well as other damages due to head scratches and electro-migration effects of GMR stack. It is usually very difficult to distinguish these phenomena explicitly by QST (quasistatic tester) and spinstand measurement test.
Most data center managers believing it to be caused by ESD, end up investing thousands of Dollars ($) in undertaking power quality audits for their facility, when actually the real culprit is “Contamination”. Did not expect this right? well yes it is.
What kind of Contamination causes Hard Disk / Mother Board failures?
Contamination, especially gaseous contamination, often results from the bacterial break down of sulfates in organic matter in the absence of oxygen, such as in swamps and sewers (anaerobic digestion). It also occurs in volcanic gases, natural gas and some well waters.
The predominant gas resulting from such places is Hydrogen sulphide and is the chemical compound with the formula H2S. This colorless, toxic and flammable gas is partially responsible for the foul odor of rotten eggs and flatulence. The odor of H2S is commonly misattributed to elemental sulfur, which is in fact odorless. H2S has been predominantly found in areas running large sewers, Industrialised town ships with polluting industries, reclaimed land (built over swamps), etc.
I need not mention how many sewers run across some of our cities which are home to many data Centers, Noida, near New Delhi (India) being one of the predominant ones. Mumbai & Bangalore which house 80% of Data Centers in India are leading the pack with the highest levels of contamination.
ANSI/ISA-S71.04-1985
The American National Standard for Environmental Conditions for Process Measurement and Control Systems: Airborne Contaminants was designed to classify airborne contaminants that may affect process measurement and control instruments.

The classification system provides users and manufacturers of instruments / equipment with a means of specifying the type and concentration of airborne contaminants to which a specified instrument / equipment may be exposed.

Two methods have been used for environmental characterization. One is a direct measure of selected gaseous air pollutants. The other, which can be termed "reactivity monitoring," provides a quantitative measure of the overall corrosion potential of an environment.

Pollution analysis may provide short-term estimates for specific sites. High values will confirm that a severe environment exists. The reverse, however, is not necessarily true. Industrial environments may contain a complex mixture of contaminants that interact to greatly accelerate (or retard) the corrosive action of individual gas species.

To avoid these practical difficulties, the nature of industrial environments is defined in terms of the
rate at which they react with copper. As a direct measure of overall corrosion potential, reactivity
monitoring involves the placement of specially prepared copper coupons in the operating environments. Copper has been selected as the coupon material because data exists which correlates copper film formation with reactive (corrosive) environments. It has proven to be particularly useful for environmental characterization. Analyses may consist of measurements of film thickness, film chemistry, or weight loss. Sensitivity of reported techniques is well within the range required for meaningful application data. In addition to the use of a copper strip a silver strip is also used. Note that Silver (Ag) results are not officially part of the ISA/ANSI S71.04 standard. These results are used to confirm copper (Cu) results, to identify potential chloride contaminants, and they provide potential identification of the specific sulfides causing corrosion
Four levels of corrosion severity are established as shown below.

There is a broad distribution of contaminant concentrations and reactivity levels existing within industries using process measurement and control equipment. Some environments are severely corrosive, while others are mild. The purpose of the contaminant classes is to define environments on the basis of corrosion rate of oxygen-free high conductivity copper.

Severity level G1
Mild — An environment sufficiently well-controlled such that corrosion is not a factor in determining equipment reliability.

Severity level G2
Moderate — An environment in which the effects of corrosion are measurable and may be a factor in determining equipment reliability.

Severity level G3
Harsh — An environment in which there is a high probability that corrosive attack will occur. These harsh levels should prompt further evaluation resulting in environmental controls or specially designed and packaged equipment.

Severity level GX
Severe — An environment in which only specially designed and packaged equipment would be expected to survive. Specifications for equipment in this class are a matter of negotiation between user and supplier.

Solution


There are very few companies in Asia that are capable of providing such a service tailored especially for the data center environments. This is a professional service and has to be delivered very carefully for accurate results. Based on the contamination identified a proper mix of Media and a proper Air purification system designed for use in data centers needs to be selected.
There are a number of issues which must be addressed in the selection of an air purification system.
• The first issue is to identify the types and concentrations of contaminants that will need to be removed. This information will be available from the corrosion chemical analysis report made available from coupon testing.
• The second issue is to determine the minimum airflow requirements for dealing with the contaminants. On odor control applications source capture will usually provide significantly less airflow requirements than a general ventilation approach, but general ventilation will normally need to be used where personnel are operating within the contaminated space.

• The third issue is to confirm any physical limitations for the equipment, anticipating all regular maintenance and media change out requirements.

After selection of the Unit the next step will be:

1. Determine the type of chemistry or chemistries that are best able to remove the contaminant(s). There are usually several options for the removal of most vapour phase contaminants, but there is usually one approach that will be more efficient, or cost effective than others.

2. Next, decide whether a cell type (light duty system) or a deep bed (heavy duty system) is required for the application. This will depend on the nature and concentration of the contaminant and the tolerance and lifespan of the media bed for retention of these contaminants.

3. The next step is to confirm the materials of construction to suit the corrosive environment.

4. The next step is to determine whether blow through or draw through configurations are preferred. Blow through is usually used for corrosion control applications, and draw through is usually used for odor control applications.

5. The next step is to determine pre and after filter requirements. The prefiltering is necessary to prevent the chemical media from being blinded from upstream particulates or mists, and normally requires a 40% roughing filter followed by a 95% after filter. Coalescing and mist eliminators and grease filters are used where moist conditions exist upstream of the air purification equipment. Final filtering is used only where it is important to eliminate any of the fine dust that may come off the media when it is first started up after a media changeout. This usually requires a 95% filter to capture this fine particulate.

6. The next step is to confirm any auxiliary requirements, including preheat, precool or humidification requirements necessary for the application.

A good idea would be to select a system designed in compliance to the ASHRAE standards.
Specifications of Media to be used (Note: these are specifically for the case of H2S contamination)
General Description: Spherical or cylindrical porous pellets formed from a combination of powdered activated alumina and other binders, suitably impregnated with potassium permanganate to provide ptimum adsorption, absorption, and oxidation of a wide variety of gaseous contaminants.

Removal Capacity:

• Hydrogen Sulfide: 0.13 g/cc min (16 % by weight)
• Sulfur Dioxide: 0.06 g/cc min (3.5% by weight)
• Nitric Oxide: 0.06 g/cc min (2.5% by weight)
• Nitrogen Dioxide: 0.016 g/cc min (1.0 % by weight)
• Formaldehyde: 0.023 g/cc min (1.4% by weight)
Manufacturing Quality Assurance Standards:
• Leach test (indication of porosity)- 180 minute or less
• Permanganate content: 8 % minimum
• Moisture Content: 20 % maximum
• Crush Strength: 40 to 60 %
• Abrasion Loss: 3.0 % maximum
• Nominal pellet Diameter: 1/8” (approximately 4 mm), 85% after screening

This post is also available as a published white paper written by the author at http://bdstrategy.asia/data-center/power-and-cooling/214-hardware-failure-look-beyond-power-a-cooling

References: Wikipedia, ANSI/ISA-S71.04-1985 standard, ASHRAE standards.

Share

Add to Technorati Favorites