BESS Thermal Daily Round Checklist

By Johnson on May 25, 2026

bess-thermal-daily-round-checklist

A grid-scale BESS does not announce a thermal problem with a warning light. It announces it with a 5 °C rack-to-rack delta that nobody walked past, an HVAC supply fan running 8 % above baseline that nobody noticed, a fire-suppression panel showing trouble status that nobody resolved — and four weeks later, a thermal runaway event that takes a container offline and trips the AHJ investigation. NFPA 855 and UL 9540A are explicit: the thermal envelope of a lithium battery system is the controlled boundary between routine operation and catastrophic failure, and the only thing that holds that boundary is a structured daily round. This page is the working daily-round checklist your operators carry — rack temperatures, gradient checks, HVAC verification, fire-suppression status, BMS alarm review, and the CMMS sign-off that closes every round before the next shift takes over.

25 °C
Optimal cell operating temperature for LFP and NMC
5 °C
Maximum cell-to-cell ΔT before degradation accelerates
35 °C
Upper operating limit before SEI breakdown accelerates
24/7
Continuous BMS, HVAC and fire-suppression monitoring

Why the Thermal Daily Round Is Non-Negotiable

Lithium battery thermal failures are almost never sudden. They progress through a predictable curve: small cell-to-cell temperature deltas, then HVAC strain compensating for rising rack temperatures, then BMS alarms suppressed by operators who assume the system "knows what it is doing." Every BESS thermal incident investigated in the last decade traced back to a thermal anomaly that was visible in daily-round data for days or weeks before the event. The daily round exists to catch that anomaly while it is still a maintenance issue, not a fire-department issue.

Degradation Curve
Every 10 °C above 25 °C roughly doubles calendar aging. Catching a 2 °C drift in the daily round saves years of cycle life across the fleet.
Runaway Threshold
LFP triggers thermal runaway at around 270 °C, NMC at around 210 °C. The thermal-management envelope is what keeps cells 200 °C below that threshold.
NFPA 855 Compliance
Daily inspection, BMS monitoring, fire-suppression status and ventilation are explicit code requirements. Documented rounds are the audit evidence.
Insurance & AHJ
Loss adjusters and the Authority Having Jurisdiction both begin investigations with the daily-round log. A gap on the day of an event is the first finding.

The Six Subsystems on Every Round

A complete thermal daily round walks six subsystems in sequence. Each subsystem produces a defined data set, has a defined acceptable range, and has a defined escalation if the reading is out of range. The round is the same shift after shift, container after container, site after site — repeatability is what turns a walk into a trending dataset.

01
Rack & Cell Temperatures
Min, max, average, and ΔT across racks and modules
02
HVAC & Cooling Loop
Supply/return temperatures, fan currents, coolant pressure, flow
03
Fire & Gas Detection
Suppression panel status, gas detector calibration, agent pressure
04
BMS & Alarm Log
Active alarms, suppressed alarms, cell voltage spread, isolation
05
Enclosure & Ventilation
Door seals, exhaust louvres, dust filters, internal humidity
06
CMMS Sign-Off
Round complete, anomalies logged, shift handover authorised

Subsystem 1 — Rack & Cell Temperature Thresholds

The headline numbers on every round are the temperature readings inside the racks. Both LFP and NMC chemistries perform best around 25 °C, with cell-to-cell ΔT held below 5 °C for uniform aging. Above 35 °C, SEI layer growth accelerates and capacity fade compounds; below 10 °C, lithium plating becomes a risk on charge cycles. The daily round captures the entire temperature envelope, not just the BMS summary value.

Reading Normal Range Watch Zone Alert Zone Action
Average cell temperature 20 – 30 °C 30 – 35 °C > 35 °C Verify HVAC, derate output, escalate to engineer
Cell-to-cell ΔT within rack < 3 °C 3 – 5 °C > 5 °C Inspect cooling distribution, identify weak module
Rack-to-rack ΔT in container < 4 °C 4 – 7 °C > 7 °C Check HVAC zone balance, airflow obstructions
Highest individual cell temp < 32 °C 32 – 40 °C > 40 °C Isolate module, BMS deep diagnostic, plan inspection
Lowest individual cell temp > 15 °C 10 – 15 °C < 10 °C Defer charging, verify heater status, check ambient

Subsystem 2 — HVAC & Cooling Loop Verification

The HVAC system is the only thing standing between battery output heat and rack temperature rise. If it is straining, the racks will tell you tomorrow; if it has failed, the racks will tell you in hours. The round captures supply/return temperatures, fan and pump currents, and coolant flow against baseline so a drift is detectable on day one, not week three.

Supply & Return Air Temperature
Record cold-aisle supply and hot-aisle return at every container. A widening delta on stable load indicates rising rack heat output — the first sign of cell degradation showing as thermal signature.
Compressor & Fan Currents
Compare against shift-baseline trend. A 10 % current rise without ambient change is a leading indicator of refrigerant loss, fouled coils, or impending compressor failure.
Coolant Loop Pressure & Flow
For liquid-cooled systems, verify pump discharge pressure, flow rate, and reservoir level. Slow pressure decay over days indicates a developing leak that becomes critical on the day it fails.
Filter & Coil Condition
Differential pressure across air filters logged daily. Coil cleanliness visually checked weekly minimum. Choked filters force compressors to work harder and warm the racks they should be cooling.

Subsystem 3 — Fire, Gas & Suppression Status

NFPA 855 mandates early-warning fire detection and engineered suppression on every grid-scale BESS. The system is only protective if it is in service every shift. The daily round verifies that detection is armed, suppression panels are in normal state, agent pressures are within spec, and no detector is in trouble or bypass mode.

Fire & Gas Detection Verification
NFPA 855 · NFPA 72 · UL 9540A

Suppression panel in normal state — no fault or trouble indication
Any trouble status logged immediately with panel address and fault code. Bypass mode is never acceptable on shift handover without authorised written justification.

Clean-agent or water-mist agent pressure within specification
Pressure gauge reading recorded for each cylinder. Below the green band requires immediate maintenance escalation and may require taking the unit offline per AHJ guidance.

Smoke, heat and gas detectors all in service — none in bypass
Every detector address verified online. H₂ and CO detectors particularly critical — off-gassing precedes visible thermal runaway by minutes and is the earliest warning the system gives.

Emergency stop & isolation devices unblocked, accessible, and labelled
Verify E-stop covers are removable, paths are clear, and labels are legible. A blocked E-stop is a finding under NFPA 855 and a serious incident contributor.

Notification to ground and remote monitoring confirmed
Daily test signal to monitoring centre acknowledged. Round entry includes confirmation that alarm transmission path is in service.

Subsystem 4 — BMS Alarm & Voltage Review

The Battery Management System sees what the operator cannot — individual cell voltage, internal resistance trend, state-of-health migration. The daily round does not replace BMS supervision; it audits it. Suppressed alarms, inhibited cells, and quiet drifts in cell voltage spread are the patterns that reveal a failing module before it becomes a thermal event.

Active Alarm Review
Every active alarm acknowledged with timestamp and operator initials
Repeated alarms in 24 hours flagged for engineer review
Alarm suppression or bypass logged with written justification
Cell Voltage Spread
Max-to-min cell voltage delta within rack under same SoC
Widening spread is the earliest signal of a weak cell
Spread exceeding manufacturer threshold escalates to engineer
State-of-Health Trend
SOH per rack logged against fleet baseline
Outlier racks identified for capacity test scheduling
Premature SOH drop is a leading indicator of accelerated aging
Module & String Isolation
Any isolated module flagged with cause and isolation date
String availability versus design verified
Persistent isolation generates corrective work order
Run Your Daily Round on a CMMS That Knows BESS
OxMaint binds every daily round to the container, the rack, the operator, and the shift. Rack temperatures, HVAC currents, suppression status and BMS alarms record against the asset. Anomalies trigger work orders automatically, and shift handover is locked until the round is closed.

Subsystem 5 — Enclosure & Ventilation

The container is the controlled environment that the rest of the thermal system depends on. A wedged-open door, a clogged exhaust louvre, a failed dust filter, or rising internal humidity all undermine the thermal envelope before the BMS sees a single anomalous reading. These checks are quick on the round, but skipped checks here cause cooling-system overwork weeks later.

Doors & Seals
All container access doors fully closed and sealed. Seals intact, no light visible at edges. Wedged-open doors during operation flagged immediately — they defeat the thermal envelope and bypass fire detection geometry.
Exhaust & Vents
Exhaust louvres, deflagration vents and pressure-relief openings clear of debris, ice, nesting, and corrosion. Vent path to atmosphere unobstructed. These are the engineered relief paths for thermal runaway gases.
Internal Humidity
Container interior humidity within design range. Condensation on internal surfaces indicates failed dehumidification or thermal-bridge ingress and accelerates terminal corrosion.
Cleanliness
No oil, water, dust accumulation, or debris on rack tops or floor. Standing water in any quantity is an immediate escalation — a known precursor to ground faults and electrolyte ingress.
Cable & Bus Bar Condition
No visible discoloration, scorching, or insulation breakdown on power cables, bus bars, or terminal connections. Spot-check temperature with IR thermometer if any visual anomaly noted.

Subsystem 6 — CMMS Sign-Off & Shift Handover

The round is not finished when the operator finishes walking. It is finished when every reading is in the CMMS, every anomaly has been categorised against an action level, and the next shift has acknowledged the handover. A round that ends in a notebook is a round that does not survive the moment the operator leaves the site.

A
Data Captured at Source
Readings entered on mobile device at the asset, not transcribed later. Photo evidence attached for anomalies. Time-stamped at the rack, not at the desk.
B
Auto-Categorisation
Every reading auto-classified against normal / watch / alert thresholds per chemistry and OEM configuration. Alert-zone entries auto-create corrective work orders.
C
Shift Sign-Off
Outgoing operator signs round complete. System validates all subsystems captured before sign-off accepted. Incomplete rounds do not close.
D
Handover Acknowledgment
Incoming shift acknowledges round, any open anomalies, and any active work orders. Continuous chain of custody across the 24-hour cycle.

Paper Round vs. CMMS-Tracked Round

BESS operators come from two operational backgrounds — substation electrical and process plant — and both have a long history of paper-round culture. Paper rounds work until they do not, and on a lithium battery system "do not" means a thermal event with insurance and regulatory consequences. A CMMS-tracked round turns the daily walk into a permanent dataset that defends every operational decision.

Capability Paper Round OxMaint CMMS Round
Reading linked to specific container, rack, and module Generic clipboard entry Locked dropdown to asset record
Threshold breach auto-classified and escalated Operator judgement, often missed Automatic per chemistry profile
Alert-zone reading auto-creates work order Manual ticket creation, often delayed Auto-generated with priority
Multi-day trend across rack temperatures and HVAC currents Reconstructed manually if at all Live trend chart per rack
Shift handover with anomaly status Verbal, often incomplete Required acknowledgment before access
NFPA 855 audit and AHJ export Box of binders, partial reconstruction Single export with chain of custody
Survives operator turnover and contractor changes Lost with the binder Permanent record on asset

Frequently Asked Questions

How often should a BESS thermal round be performed?
Daily minimum for grid-scale and commercial systems, with continuous BMS monitoring underneath. Some operators run two rounds per shift on high-cycling systems or sites in extreme ambient conditions. The daily round is the floor — NFPA 855 implementation depends on it, and AHJ inspections expect to see the continuous record. Configure round frequency per asset in OxMaint PM scheduling.
Why is cell-to-cell ΔT more important than absolute temperature?
A uniform 32 °C across a rack is degrading the whole rack equally — and predictably. A 5 °C delta with one cell at 35 °C signals a localised failure mode: uneven cooling, weak module, blocked airflow path. Uniform degradation is manageable; localised hot spots are how thermal runaway begins.
What is the difference between LFP and NMC thermal management?
LFP is intrinsically more thermally stable — its runaway threshold is around 270 °C versus around 210 °C for NMC. Both perform best at 25 °C with ΔT under 5 °C. NMC requires tighter HVAC control because its failure progression is faster once it begins; LFP gives more warning time but is not "safe" in any absolute sense.
Does the BMS replace the operator daily round?
No. The BMS monitors what it is configured to monitor and alarms what it is configured to alarm. The daily round audits the BMS itself, verifies fire-suppression status, inspects enclosure integrity, and captures the human observations that no sensor sees — water on the floor, an off smell, a wedged door, a dust accumulation. Both are required.
How does OxMaint handle a thermal threshold breach during a round?
Any reading captured in OxMaint that crosses into watch or alert zone against the configured threshold auto-generates a corrective work order at matching priority, notifies the responsible engineer, attaches the round entry to the asset record, and updates the rack status on the reliability dashboard. The breach cannot be dismissed without documented action. To see the workflow live, book a 30-minute walkthrough.
Run Your BESS Operations on a CMMS Designed for Lithium Systems
OxMaint ships with pre-built daily round templates aligned to NFPA 855, UL 9540A, and IEC 62619. Threshold zones, chemistry-specific operating ranges and alarm escalation are pre-loaded for LFP and NMC. Every reading binds to the asset, every breach generates a work order, every shift handover locks until the round is complete. Stop running grid-scale storage on paper. Start running it on data.

Share This Story, Choose Your Platform!