Data Center CRAC and CRAH Unit Predictive Maintenance Checklist
By james smith on April 28, 2026
When data center cooling fails, the thermal event clock starts immediately — ASHRAE TC 9.9 defines the Class A1 recommended operating envelope as 18°C to 27°C at IT equipment inlet, and a CRAC or CRAH unit that stops delivering within that envelope is generating a tier-level incident, not a maintenance work order. The three most common causes of data centre cooling failures are not mysterious: a CRAC compressor whose rising current draw went unmonitored for four weeks before failure; a CRAH chilled water control valve that drifted from specification during an unlogged BMS software update; and a redundant standby unit with low refrigerant from months of deferred maintenance that failed when it was called upon to cover the primary unit that was taken offline for a scheduled repair. Every one of these failures was predictable weeks in advance from data that was available and unread. Book a 30-minute demo to see how Oxmaint's Predictive Maintenance AI monitors CRAC and CRAH unit performance parameters — compressor current draw, discharge temperature trend, chilled water delta-T, sensor calibration drift — and generates work orders when parameter trends cross defined thresholds, weeks before the thermal event the calendar-based PM schedule would have been too late to prevent. Or start a free trial and connect your first CRAC asset today.
Predictive Maintenance AI · Data Center Cooling · ASHRAE TC 9.9
Data Center CRAC & CRAH Unit Predictive Maintenance Checklist
Daily, weekly, monthly, quarterly, and annual maintenance tasks for CRAC (DX refrigerant) and CRAH (chilled water) units — with AI-monitored predictive parameters, EPA 608 compliance items, ASHRAE A1 thermal envelope checks, and the critical rule on standby unit PM.
ASHRAE TC 9.9 target — 18–27°C inlet air; humidity by dew point control
3–5%
Compressor current rise weeks before failure — the AI trigger that calendar PM misses
1–2°C
Discharge temp creep over 2 weeks = coil fouling or low refrigerant — AI-detectable
Standby = Active
Redundant units must receive identical PM frequency — most common cause of cooling outages
The Standby Unit Rule — Non-Negotiable
Every PM task in this checklist applies equally to active and standby/redundant units. A standby CRAC unit with low refrigerant, a drifted sensor, or a belt past its service life is not a backup — it is a failure waiting for the moment the primary unit is taken offline. The majority of cooling-related data centre tier incidents trace directly to standby units that failed when called upon due to deferred maintenance. Every work order raised from this checklist must explicitly confirm which unit was serviced: primary, secondary, or tertiary.
CRAC (DX / Compressor-Based)
CRAH (Chilled Water)
Both Unit Types
DAILYOperator-level checks — no certification required · Applies to ALL active and standby units
BOTH
Verify supply air discharge temperature is within ASHRAE A1 setpoint — compare against setpoint and log actual reading. Any temperature above 27°C at the unit discharge or below 18°C indicates control failure, sensor drift, or airflow obstruction and must trigger an immediate investigation, not a re-read
Record: Daily temperature log per unit ID · Ref: ASHRAE TC 9.9 A1 class thermal envelope · Role: Data centre operator
BOTH
Review BMS alarm log for all CRAC and CRAH units — confirm all active alarms are acknowledged, assigned to a work order, and have a resolution timeline. Alarms acknowledged without a work order raised are the most common documentation gap in data centre cooling audits
Record: BMS alarm review log with acknowledgement and WO number · Role: Facilities operator
BOTH
Confirm all standby and redundant units show "ready" status on BMS — a standby unit in alarm, fault, or manual override that is not being actively investigated is a hidden single point of failure. Log any standby units not in automatic-ready status and escalate immediately if not already under active maintenance
Record: Standby unit status log · Role: Data centre operator
WEEKLYTechnician-level checks · Applies to ALL active and standby units
BOTH
Check air filter differential pressure — compare against OEM dirty-filter threshold. Dirty filters overwork fan motors, reduce cooling capacity, and raise energy consumption. If filters are dirtier than expected for elapsed time, investigate source contamination — cardboard unpacking inside the data hall is the most common source of accelerated filter loading and an absolute prohibition
Record: Filter DP reading per unit ID · Role: HVAC technician
CRAC
Verify refrigerant suction and discharge pressure readings — compare against P-T chart for current conditions and log against previous week's baseline. A suction pressure trend declining by more than 5 psi over 4 weeks indicates refrigerant undercharge requiring immediate leak investigation. Rising discharge pressure with normal suction indicates condenser fouling or condenser water temperature issue
Record: Refrigerant pressure log per unit ID · Ref: EPA 608 leak rate monitoring · Role: EPA 608 certified technician
CRAH
Record chilled water supply and return temperatures and calculate delta-T — compare against baseline. A narrowing delta-T (less than 8°F from design) indicates coil fouling, reduced flow, or a control valve not opening fully. This is the primary AI-detectable early warning for CRAH performance degradation
Record: CHW delta-T log per unit ID · Role: Mechanical technician
BOTH
Visual inspection of condensate drain pan and condensate pump — test pump operation by introducing water to confirm discharge. Data centre condensate pumps can sit idle for months if humidity control is active — a failed pump that is not tested until condensate accumulates will overflow, potentially onto raised floor electrical components
Record: Condensate pump test log · Role: HVAC technician
Oxmaint AI Predictive Triggers — What Calendar PM Misses
CRAC
Compressor current draw rising 3–5%
Bearing wear or refrigerant shortage. Calendar PM schedules a check in 90 days. The failure occurs in 30.
AI generates inspection WO when trend threshold crossed → avoids unplanned outage
CRAC
Discharge temp creeping +1–2°C over 2 weeks
Evaporator coil fouling or refrigerant undercharge — no visible symptom, no alarm triggered at this stage.
AI flags trend → coil cleaning or refrigerant check WO raised before capacity loss occurs
CRAH
CHW supply/return delta-T narrowing week-on-week
Coil fouling, flow restriction, or failing control valve — cooling capacity degrading silently.
AI detects trend → control valve stroke test or coil cleaning WO generated automatically
BOTH
Sensor reading flat-lined or drifting ±2°C from cross-reference
Sensor failure producing false "normal" while actual conditions deteriorate. The most dangerous silent failure mode.
Cross-sensor agreement monitoring → calibration WO generated before sensor provides misleading data to BMS
Oxmaint AI monitors CRAC and CRAH parameters continuously — generating work orders when trends cross thresholds weeks before failure, not after the thermal event that calendar PM was too late to prevent.
MONTHLYHVAC technician level · Applies to ALL units including standby
CRAC
Record compressor current draw (amps) per phase and compare against OEM nameplate and the unit's own historical baseline. A 3–5% sustained upward trend across consecutive months is the primary predictive indicator of bearing wear, refrigerant shortage, or developing compressor inefficiency. Log amps at standard load conditions for valid trend comparison
Record: Compressor amps log per unit per phase · Role: EPA 608 certified technician
BOTH
Fan motor vibration check and amp measurement — compare against baseline. For belt-drive units: check belt tension against OEM specification (quarterly tensioning required). Over-tensioning wears both belts and bearings; under-tensioning causes slippage and reduced airflow. EC motor units: check for unusual noise and verify variable speed drive parameters have not been altered from optimised settings
Record: Fan motor amps + vibration reading + belt tension if applicable · Role: HVAC technician
BOTH
Inspect humidifier system — steam canister type: check for scale buildup and confirm canister is within rated life cycle; infrared type: clean water pan of mineral scale accumulation; ultrasonic type: verify water filter replacement is current and check nozzle for clogging. All types: use deionised or well-filtered water to reduce maintenance cycle frequency and mineral deposit damage
Record: Humidifier inspection log with canister/filter replacement date · Role: HVAC technician
CRAH
Inspect chilled water control valve operation — confirm valve modulates correctly across the full 0–100% stroke range in response to BMS control signals. A control valve stuck at partial open is the most common cause of narrowing CHW delta-T and reduced cooling capacity in CRAH units, and it frequently presents as a controls issue when it is a mechanical valve issue
Record: Control valve stroke verification log · Role: Controls technician
QUARTERLYSpecialist technician level · EPA 608 certification required for refrigerant work
CRAC
Full refrigerant circuit leak test using electronic leak detector on all service valves, schrader ports, flare connections, and brazed joints. EPA Section 608 requires repair within 30 days if annual leak rate exceeds 15% for comfort cooling systems with 50+ lb charge. Log all refrigerant added during this quarter against the annual leak rate calculation. Confirm oil level at sight glass — too little or too much reduces compressor service life
Record: Leak test log + refrigerant addition record · Ref: EPA Section 608, 40 CFR Part 82 · Role: EPA 608 certified technician
BOTH
Deep evaporator and condenser coil cleaning — remove all access panels, use manufacturer-approved coil cleaner, rinse thoroughly, and confirm coil fins are not damaged during cleaning. A 1–2°C discharge temperature creep over 2 weeks is the AI-detectable signal; deep cleaning when detected restores performance before the compressor or fans are overworked compensating for fouled coils
Record: Coil cleaning log with pre/post discharge temperature comparison · Role: HVAC technician
BOTH
Sensor calibration verification — compare all temperature and humidity sensors against a calibrated reference instrument in the same air stream. ASHRAE TC 9.9 now specifies dew point control for humidity rather than relative humidity; confirm BMS humidity control is using dew point measurement, not RH, and that the dew point sensor is within calibration. Any sensor drifted beyond ±0.5°C temperature or ±3% RH requires replacement or recalibration before next PM cycle
Record: Sensor calibration log with reference instrument reading vs BMS reading · Ref: ASHRAE TC 9.9 · Role: Controls technician
BOTH
Redundant unit load failover test — transfer cooling load to standby unit under controlled conditions and confirm it can deliver rated capacity at the design inlet temperature. This test is the only reliable method to confirm a standby unit's readiness; a unit that passes visual inspection and daily status checks but cannot deliver rated capacity under load has deferred maintenance that is now a verified liability
Record: Failover test result with inlet temp, discharge temp, and amps at rated load · Role: Facilities engineer
ANNUALSpecialist contractor level · Full performance testing and electrical thermography
BOTH
Full performance test at rated load capacity — measure supply temperature, return temperature, airflow (CFM), and power consumption (kW). Calculate actual kW/ton or kW/CFM and compare against unit specification. A unit running 15% above its rated power at design cooling capacity indicates compressor or coil degradation that can be costed against replacement in the capital planning cycle
Record: Annual performance data sheet — retain for lifecycle planning · Role: Specialist HVAC contractor
BOTH
Electrical connection thermography — infrared scan of all electrical panels, motor terminals, compressor contactor, and control board connections. Hot spots at connection points indicate high-resistance joints that will fail under load; thermography catches these before the electrical event that results in an emergency shutdown. Document every connection point with before-and-after temperature if remediation is performed
Record: Thermography report with IR images attached to unit asset record · Role: Thermography-certified technician
CRAC
Replace fan belts on belt-drive units regardless of visual condition — annual replacement is the recommended practice for data centre CRAC units where the cost of belt failure (unplanned thermal event) vastly exceeds the cost of a belt set. Self-tensioning belt systems may extend to 5-year intervals per manufacturer specification; confirm OEM recommendation before applying extended interval
Record: Belt replacement log with belt type and size · Role: HVAC technician
CRAC / CRAH Performance KPIs — Trending That Proves PM Is Working
KPI
Unit Type
Target / Normal Range
AI-Detectable Drift Signal
Supply air temperature
Both
18–27°C (ASHRAE A1 class)
+1–2°C creep over 2 weeks = coil fouling or refrigerant issue
Compressor current draw
CRAC only
Within ±5% of baseline amps
3–5% sustained rise = bearing wear or refrigerant shortage
Chilled water delta-T
CRAH only
≥ design ΔT (typically 10–14°F)
Narrowing ΔT = coil fouling, flow restriction, or failing control valve
Filter differential pressure
Both
Below OEM dirty-filter threshold
Rapid rise = contamination source in data hall — investigate
Drift beyond ±0.5°C = sensor replacement or recalibration required
"The data centres achieving 99.999% cooling availability are not doing more maintenance — they are doing the right maintenance at the right intervals with proof that every task was completed on time. They can show me the last refrigerant reading on any CRAC unit in 30 seconds. They can prove every redundant unit was load-tested this quarter. They can pull trend data on any compressor, any pump, any cooling tower fan going back years. The facilities that suffer thermal events almost always have paper-based or fragmented maintenance records where critical information lives in someone's memory rather than a searchable system. The compressor that failed last August showed a 4% current draw trend increase over 11 weeks before it failed. That trend was in the data. Nobody was looking at the trend — they were looking at the alarm that came after the failure. Predictive maintenance is not a product feature. It is a documentation discipline that makes the early warning visible before the event occurs."
David Osei, CDCP, CDCE, CDCTP
Certified Data Centre Professional · Certified Data Centre Energy Practitioner · Certified Data Centre Technician Professional · 19 years data centre cooling infrastructure — hyperscale and enterprise colocation · ASHRAE TC 9.9 contributor, Thermal Guidelines for Data Processing Environments
Frequently Asked Questions
What is the difference between CRAC and CRAH maintenance requirements?
The fundamental difference is the cooling mechanism. CRAC units use a DX refrigerant cycle with a compressor — maintenance focuses on refrigerant charge, compressor condition (oil level, current draw trending), refrigerant leak testing (EPA 608), and condenser coil cleaning. CRAH units use chilled water and a control valve — no compressor, so no refrigerant maintenance, but the control valve stroke, chilled water delta-T, and coil fouling are the primary maintenance focus. CRAH units generally require less maintenance but their chilled water plant (chiller, pumps, tower) adds its own maintenance obligations. Both unit types share: filter management, fan motor checks, sensor calibration, humidifier maintenance, and condensate pump testing. Book a demo to see separate Oxmaint PM templates for CRAC and CRAH assets.
How often should data centre CRAC and CRAH units be serviced?
Data centre cooling PM cannot follow commercial HVAC schedules. The minimum frequency framework for five-nines availability: daily — discharge temperature and BMS alarm review; weekly — filter DP, refrigerant pressure check (CRAC), CHW delta-T (CRAH), condensate pump test; monthly — compressor amps, fan motor vibration, humidifier inspection; quarterly — full refrigerant leak test (CRAC), deep coil cleaning, sensor calibration, redundant unit failover test; annually — full performance test, electrical thermography, belt replacement. Every interval applies to standby and redundant units at identical frequency to active units.
What predictive maintenance parameters should be monitored for CRAC compressor health?
The three most valuable CRAC compressor predictive parameters are: current draw (amps) — a 3–5% sustained upward trend over 4–8 weeks indicates bearing wear or refrigerant shortage weeks before failure; discharge temperature — a 1–2°C creep over 2 weeks signals coil fouling or refrigerant undercharge; and suction pressure trend — a declining trend indicates refrigerant undercharge developing. Calendar-based PM schedules a check every 90 days; a compressor trending toward failure crosses the intervention threshold in 30 days. AI monitoring of these parameters bridges the gap. Start a free trial to configure predictive alerts for your CRAC fleet.
Why do standby CRAC units fail when they are called upon during primary unit maintenance?
Standby units fail on demand because they receive less maintenance attention than active units, yet face a more demanding test — they must deliver full rated capacity immediately when called upon, with no warm-up period. The most common causes of standby unit failure are: refrigerant charge below operational level from deferred leak testing; sensors that have drifted without being identified because the unit is not producing active alarms; belts or bearings that degraded during extended idle periods; and capacitors that have aged and fail on first start under load. The quarterly redundant unit failover test under controlled conditions is the only reliable method to confirm standby readiness — and it must be documented, not assumed.
What ASHRAE standards govern data centre CRAC and CRAH maintenance requirements?
The primary ASHRAE reference for data centre cooling maintenance is ASHRAE TC 9.9 — Thermal Guidelines for Data Processing Environments (5th Edition, 2021), which defines the A1 class recommended operating envelope (18–27°C inlet, dew point-based humidity control). ASHRAE Standard 180 sets minimum PM requirements for HVAC equipment in commercial buildings and serves as the floor for data centre cooling PM intervals. ASHRAE Guideline 36 provides high-performance sequences of operation for chilled water plants serving CRAH units. For EPA 608 refrigerant compliance, ASHRAE Standard 147 provides guidance on refrigerant management practices that supplement the EPA regulatory requirements.
The CRAC Trend That Predicts Failure 4 Weeks Out Is in Your Data. Oxmaint AI Reads It.
Oxmaint Predictive Maintenance AI monitors CRAC compressor current draw, discharge temperature trending, CRAH chilled water delta-T, and sensor calibration drift — generating work orders automatically when parameter trends cross defined thresholds, weeks before the thermal event that calendar-based PM was scheduled too late to prevent. Every task completed is logged with timestamp and technician attribution, building the audit-ready maintenance history that enterprise customers and Uptime Institute Tier certification require.