At $9,000 per minute of downtime, a data center that runs cooling and power infrastructure on spreadsheets and reactive repair protocols is not managing risk — it is accumulating it. Power and cooling failures cause 70% of all significant outage incidents, according to the 2024 Uptime Institute report. The assets driving those failures — UPS systems, CRAC units, generators, PDUs, transfer switches — fail in predictable ways, on predictable timelines, with warning signs that structured preventive maintenance catches weeks in advance. The gap between 99.9% uptime and five-nines is not a technology gap. It is a maintenance discipline gap. Start a free trial to see how Oxmaint structures critical infrastructure PM or book a demo configured for your facility tier.
$9,000
Per Minute Downtime Cost
Average enterprise data center outage cost in 2025 — up from $5,600 in 2019
70%
Outages From Power & Cooling
Uptime Institute 2024 — the two systems that structured PM most directly protects
28%
Generator Start Failure Rate
Reactive programs — drops below 2% with monthly load bank testing in CMMS
26 min
Max Annual Downtime — Tier IV
99.995% uptime requires maintenance precision that spreadsheets cannot deliver
The Four Interconnected Systems That Determine Uptime
Data center uptime is not a single system problem. It is a four-system interdependency — and a failure in any one cascades immediately into the others. Understanding how each system's maintenance program connects to the others is the starting point for building a maintenance operation that actually delivers five-nines reliability. Start a free trial and load your critical infrastructure asset registry on day one.
System
Key Assets
Primary Failure Mode
PM Trigger Type
Power
UPS, generators, ATS, PDUs, battery strings
Battery degradation, fuel fouling, ATS mechanical fault
Runtime hours + quarterly load test
Cooling
CRAC/CRAH, chillers, cooling towers, economizers
Compressor wear at overload, coil fouling, refrigerant loss
Runtime hours + thermal readings
Physical Infra
Raised floors, cable management, containment, fire suppression
Airflow bypass, suppression agent expiry, VESDA blockage
Calendar + occupancy change triggers
Compliance
Tier documentation, SOC 2, ISO 27001, audit records
Documentation gap; PM history gaps discovered at audit
Continuous auto-generated audit trail
The 6 Assets Most Likely to Cause Your Next Outage
Within power and cooling, six specific asset classes generate the majority of incidents — and all six have documented failure modes that preventive maintenance intercepts. Book a demo to walk through a PM matrix for your specific asset mix.
UPS Battery Systems
The leading cause of data center downtime. Batteries degrade silently — internal impedance rises over 3–5 years while capacity appears normal until full load is demanded during a grid event. Without quarterly impedance testing logged in a CMMS, failure is the first symptom.
PM: Quarterly impedance test + annual capacity discharge test
CRAC / CRAH Units
AI workload densification is pushing CRAC units to 130–160% of rated capacity. Compressor failure at 18,000 hours under overload costs $45,000–$120,000 in emergency replacement. Vibration trending detects bearing wear at 12,000 hours — planned bearing change costs $2,400.
PM: Monthly filter inspection + quarterly coil cleaning + runtime-based bearing check
Standby Generators
28% failure-to-start rate in reactive maintenance programs. Fuel degradation causes injector fouling within 6–12 months of storage. Coolant leaks and battery faults go undetected without monthly inspections. A generator that fails to start during a utility event means site outage — not backup power.
PM: Monthly fuel sampling + quarterly full-load bank test + annual coolant analysis
Automatic Transfer Switches
The critical link between utility power, UPS, and generator. Mechanical failure in the ATS creates a scenario where the generator starts correctly but power never transfers. ATS testing under simulated utility failure is required — not just visual inspection.
PM: Semi-annual transfer test + annual full maintenance inspection
Power Distribution Units (PDUs)
Load imbalance across PDU phases creates thermal stress that accelerates insulation degradation. Three-phase imbalance above 5% indicates a load management problem. Infrared thermography during annual PDU maintenance catches hot spots before they become failures.
PM: Quarterly load balance check + annual IR thermography scan
Fire Suppression Systems
Expired suppression agents, untested release mechanisms, and clogged VESDA sampling pipes create undetected risk that only surfaces when the system is needed — or during a compliance audit. NFPA 75 requires documented annual inspection with qualified technicians.
PM: Semi-annual VESDA cleaning + annual agent verification + quarterly alarm test
Reactive Maintenance vs. CMMS-Structured PM: Real Numbers
Asset / Scenario
Reactive Program
Oxmaint PM Program
CRAC Uptime
Compressor failure at 18K hrs → $45K–$120K emergency replacement
Bearing wear caught at 12K hrs → $2,400 planned repair
Generator Reliability
28% failure-to-start at rated load
Under 2% failure rate with monthly testing
PUE Performance
Fouled coils inflate PUE to 1.6–1.9 — $800K–$2.4M excess energy cost
PM above 90% holds PUE at 1.2–1.35 — EU EED compliant
Tier Audit Readiness
2–4 week manual record assembly; gaps common; remediation $80K–$200K
Compliance export from Oxmaint in under 5 minutes
Annual Downtime Incidents
4–8 incidents/year at $9K/min average → $1.8M–$4.6M impact
68% reduction in unplanned incidents with structured PM compliance
How Oxmaint Manages Critical Infrastructure Maintenance
Independent Redundancy Tracking
Path A and Path B are never confused. Each redundant component — UPS-A, UPS-B, CRAC-01, CRAC-02 — is a separate asset with its own PM schedule, test history, and compliance record. Redundancy is only reliable when both paths are independently verified to be in spec.
Runtime-Based PM Scheduling
Critical infrastructure PM scheduled by equipment runtime hours, load cycles, and environmental thresholds — not just calendar time. A CRAC running at 150% load accumulates wear 3× faster than spec. Oxmaint adjusts PM frequency to match actual operational load.
SOC 2 & Tier Compliance Automation
SOC 2, ISO 27001, and Uptime Institute audit packages generated automatically from completed PM records. Trend data predicts equipment end-of-life for capital planning. Audit preparation drops from weeks to minutes — and the documentation is accurate because it was generated in real time.
PUE Trend Monitoring
Cooling system PM completion directly affects PUE. Oxmaint tracks PUE trends against PM compliance rates, surfacing the correlation between deferred maintenance and energy cost overruns. Every 0.1 improvement in PUE translates to tens of thousands of dollars in annual energy savings.
Your Next Grid Event Is Testing Every PM Gap You Have
OxMaint gives data center facility teams automated PM schedules for power, cooling, fire suppression, and physical infrastructure — with real-time compliance dashboards and five-nines-grade documentation. Go live in under 14 days.
Frequently Asked Questions
How does Oxmaint track N+1 redundant systems so that Path A and Path B are maintained independently?
OxMaint creates individual asset records for each redundant component. Path A UPS and Path B UPS are separate assets with separate PM schedules, test result histories, and compliance records. The system never aggregates them — and never lets one path's PM completion count as evidence for the other. When redundancy is the only thing standing between you and a site outage, that separation is not just good practice. It is the difference between reliable redundancy and the illusion of it.
Start free to see the redundancy tracking module configured for your architecture.
What PM schedule does a standby generator require to achieve under 2% failure-to-start rate?
Monthly fuel sampling checks for microbial growth and water contamination — both causes of injector fouling. Quarterly full-load bank tests at rated capacity verify actual transfer capability under load, not just at idle. Annual coolant analysis, battery inspection, and complete mechanical service covers the remaining failure modes. OxMaint schedules all of these automatically, tracks the results, and generates the compliance records that Uptime Institute and your insurance carrier will want to see. The 28% failure rate in reactive programs drops below 2% with this regimen consistently followed.
Book a demo to configure your generator PM schedule in OxMaint.
How does cooling system PM affect PUE and energy costs?
Fouled coils and degraded cooling systems inflate PUE from a target of 1.2–1.35 to 1.6–1.9 — adding $800,000 to $2.4 million in annual energy costs at scale. Each 0.1 improvement in PUE translates to tens of thousands of dollars in savings, and the difference between best-in-class PUE and average is almost entirely a maintenance discipline story. Quarterly coil cleaning, filter replacement on schedule, refrigerant checks, and airflow balancing are what hold PUE at target — not hardware upgrades alone. OxMaint's PM schedules are tied to PUE tracking, so the operational connection between maintenance and energy performance is visible in your dashboards.
What does a SOC 2 or Uptime Institute audit require from a maintenance documentation perspective?
SOC 2 Type II and Uptime Institute certification both require documented PM histories for every critical system, with timestamps, technician identification, and evidence that work was completed as described. They require corrective action records for any failed PM or discovered deficiency, and they require that the records be readily retrievable on demand. Paper-based programs consistently fail because records are incomplete, undated, or cannot be located. Facilities that assemble records manually typically spend 2–4 weeks preparing for an audit, with gaps requiring remediation that costs $80,000–$200,000. OxMaint generates the complete audit package — filtered by date range, asset class, or regulatory standard — in under 5 minutes, because the records were created in real time when the work was done.
Start free and have your first audit-ready PM records live this week.