NVIDIA GB300 NVL72 Specs for Trillion-Parameter Models

By Riley Quinn on May 5, 2026

gb300-nvl72-specs-industrial-ai

One rack. 72 Blackwell Ultra GPUs. 36 Grace CPUs. 20.7 TB of unified HBM3e memory. 1.1 exaFLOPS of FP4 compute. 130 TB/s of NVLink bandwidth across the GPU domain. 120 kW of power draw — equivalent to 80 American homes — fully liquid-cooled at 100%. And on a single trillion-parameter model at FP4, only 2 of those 72 GPUs are needed to hold the weights, leaving 70 GPUs free for KV-cache, activations, and concurrent reasoning passes. The NVIDIA GB300 NVL72 isn't a server — it's a single-rack AI supercomputer engineered around one architectural decision: keep every GPU one fabric hop from every other GPU, so trillion-parameter models train and infer as if they live in one massive memory space. Microsoft Azure, CoreWeave, and Oracle Cloud are deploying these at scale. Industrial AI factories are next. Sign up free to explore the GB300 NVL72 reference architecture for industrial AI.

MAY 12, 2026  5:30 PM EST , Orlando
Upcoming OxMaint AI Live Webinar — GB300 NVL72 for Industrial AI: Trillion-Parameter Reference Architecture
Live session for AI infrastructure architects, plant CIOs, ML platform leads, and enterprise IT spec'ing rack-scale AI for industrial workloads. We'll walk through the GB300 NVL72 architecture — Blackwell Ultra silicon, 288 GB HBM3e per GPU, 130 TB/s NVLink fabric — and how it integrates with the OxMaint Enterprise AI tier for trillion-parameter model fine-tuning, multi-plant simulation, and physics-AI workloads.
GB300 NVL72 architecture walkthrough
Trillion-parameter model fit math
Power, cooling, and fabric requirements
Live OxMaint Enterprise AI demo

The Anatomy of a GB300 NVL72 Rack

The GB300 NVL72 is a 48U rack-scale system — physically one unit, logically one massive GPU. Inside that 48U enclosure: 18 compute trays each holding 4 Blackwell Ultra GPUs and 2 Grace CPUs (totalling 72 + 36), 9 NVLink switch trays, and the cooling/power distribution to sustain 120 kW continuously. Here's what fits inside one rack.

72
Blackwell Ultra GPUs
B300 silicon — 20,480 CUDA cores, 640 5th-gen Tensor Cores, 208B transistors per GPU. Dual-die package. 1,400 W per GPU under full load.
36
NVIDIA Grace CPUs
Arm Neoverse V2-based. ~472 GB LPDDR5X each. 17 TB total CPU memory. NVLink-C2C interconnect to GPUs (one Grace + two B300s per superchip).
288GB
HBM3e per GPU
12-Hi HBM3e stacks (50% more capacity than GB200's 192 GB at 8-Hi). 8 TB/s memory bandwidth per GPU. ECC throughout.
20.7TB
Unified rack memory
All 72 GPUs' HBM3e exposed as a single coherent address space via NVLink fabric. Up to 40 TB fast memory total when LPDDR5X is included.
130TB/s
NVLink fabric
9 NVSwitch chips. 300 ns switch latency. 576-way GPU communication. NVLink 5.0 at 1.8 TB/s per GPU (900 GB/s unidirectional).
800Gb/s
ConnectX-8 SuperNIC per GPU
2× the GB200 ConnectX-7 at 400 Gb/s. Quantum-X800 InfiniBand or Spectrum-X Ethernet. Inter-rack scale-out without losing bandwidth balance.
120kW
Power per rack
100% liquid-cooled — air-cooling not an option. ~409,000 BTU/hr heat output. 3,000+ lbs rack weight. Requires 480kW-class facility loop.

The Performance Multipliers — vs Hopper, vs Blackwell

The reason every frontier-AI cloud is racking these in 2026 isn't incremental — it's stepwise. Compared to NVIDIA Hopper systems (H100), the GB300 NVL72 delivers 50× more AI factory output performance and 30× faster real-time inference for trillion-parameter LLMs. Compared to the prior Blackwell generation (GB200), it adds 1.5× more FP4 compute, 2× attention-layer performance, and 50% more HBM3e per GPU. These multipliers compound at scale. Book a demo to see the GB300 performance numbers run on your specific AI workload.

vs NVIDIA Hopper (H100)
70×
Faster FP4 inference vs H100 systems — single-rack throughput
50×
AI factory output performance increase vs Hopper-based platforms
30×
Real-time trillion-parameter LLM inference speedup
10×
Tokens/second per user · 5-second video gen: 90s → 3s on Cosmos
vs Blackwell (GB200 NVL72)
2.0×
Attention-layer performance — transformer inference at long context
1.5×
More dense FP4 Tensor Core FLOPS vs first-gen Blackwell
1.5×
More HBM3e memory per GPU (288 GB vs 192 GB)
2.0×
ConnectX-8 SuperNIC bandwidth (800 Gb/s vs 400 Gb/s per GPU)

Compute Precision Tiers — Where Each Format Wins

The B300 GPU at the heart of the GB300 NVL72 supports five precision tiers — TF32 / FP16/BF16 / FP8 / FP6 / NVFP4 — each engineered for a different stage of the AI workload lifecycle. The headline number is 1.1 exaFLOPS of FP4 compute at the rack level — but FP4 is for inference and reasoning, not for training. Here's how the precision tiers map to actual workload stages.

Swipe to see all columns
Precision
Rack PFLOPS
vs B200
Best Used For
NVFP4
1,080 (1.1 EF)
+66.7%
Inference, AI reasoning, agentic workloads, KV-cache compression
FP8
360
+11.1%
Production inference, mixed-precision training, fine-tuning
FP16/BF16
180
+11.1%
Foundation model training, gradient computation, optimizer states
TF32
90
+13.6%
Drop-in FP32 replacement for legacy code paths, scientific compute

Trillion-Parameter Model Fit — How the 288GB Per GPU Changes the Math

The single architectural decision that separates the GB300 NVL72 from every other AI system on the market is the 288 GB HBM3e per GPU. For a quantized trillion-parameter model at FP4 — about 500 GB of weights — the entire model fits across just 2 GPUs, leaving 70 GPUs available for KV-cache, activations, and concurrent reasoning passes. On the prior GB200 generation, that same model needed 3 GPUs just for weights. Here's how the math compounds at production model sizes.

70B Model
Llama-3 70B class
Weights at FP4
35 GB · 1 GPU
Weights at FP8
70 GB · 1 GPU
Weights at FP16
140 GB · 1 GPU
405B Model
Llama-3 405B class
Weights at FP4
200 GB · 1 GPU
Weights at FP8
405 GB · 2 GPUs
Weights at FP16
810 GB · 3 GPUs
1T Model
Trillion-parameter frontier
Weights at FP4
500 GB · 2 GPUs
Weights at FP8
1 TB · 4 GPUs
Weights at FP16
2 TB · 8 GPUs · still fits
Bar widths represent fraction of the 20.7 TB total rack HBM3e. Even a trillion-parameter model at FP16 occupies less than 10% of rack memory — the rest is available for KV-cache, activations, and concurrent inference passes.

The Industrial AI Use Case — Where the GB300 NVL72 Earns Its Keep

The GB300 NVL72 is engineered for frontier-scale AI, but its industrial relevance is concrete: physics simulation, multi-plant digital twins, agentic AI for plant operations, and trillion-parameter models fine-tuned on enterprise data without leaving the firewall. For the OxMaint Enterprise AI tier, this hardware tier supports four specific workload categories that single-GPU workstations can't reach. Sign up free to see the GB300-class workloads supported in the OxMaint Enterprise AI build, or book a demo to walk through the four workload categories with an OxMaint architect.

01
Physics + Digital Twin Simulation
NVIDIA Cosmos and Omniverse-driven physics simulation for multi-plant digital twins. Five-second video generation in real time (3 seconds vs 90 on Hopper). Plant-floor simulation with thousands of asset states updated in parallel.
Trillion-Parameter Fine-Tuning
02
Fine-tune frontier LLMs (Llama-3 405B, Mixtral 8×22B, custom 1T-class) on enterprise data — work order history, technician notes, equipment manuals, OT logs. Source-access control, full data sovereignty.
03
Agentic AI for Plant Operations
Multi-step reasoning across asset health, supply chain, energy optimization, and work-order routing. Test-time scaling — the GB300 architecture is purpose-built for AI reasoning workloads that grow compute dynamically per query.
Multi-Plant Foundation Models
04
Train enterprise-specific foundation models across multiple plants' data, with federated layers per plant. Long-context inference (1M+ token windows) for full-equipment-history reasoning at single-query cost.
Pre-Configured · GB300-Ready · Ships in 6–12 Weeks
Order an OxMaint Enterprise AI Tier With GB300 Architecture Pre-Integrated
The OxMaint Enterprise AI build delivers GB300-class compute for trillion-parameter fine-tuning, multi-plant digital twin simulation, and agentic AI reasoning — pre-configured with the OxMaint AI software stack, NVIDIA Mission Control management, NVFP4 inference optimization, and the integration patterns that turn rack-scale compute into production plant AI. Pre-configured, pre-tested, ready to plug into your enterprise within days. No SaaS lock-in. No recurring fees.

Investment Summary — Per-Plant Rollout + Enterprise AI

The GB300 NVL72 itself is a multi-million-dollar rack-scale deployment (estimated $3-4M per rack at hyperscaler scale). For most industrial customers, the practical entry point is the OxMaint Enterprise AI tier — a DGX Station GB300-class build at $85,000-$100,000, deployed once and shared across multiple plants. Per-plant builds use the workstation-class RTX PRO Blackwell servers. Here's the actual cost breakdown OxMaint deploys at customer sites. Sign up free to see the GB300-tier configuration for your specific footprint.

Swipe to see breakdown
Component
Unit Cost
Per Plant (4 mo)
Notes
RTX PRO 6000 Blackwell 96GB Server (Omniverse)
$19,000
$19,000
Digital Twin rendering & simulation per plant
NVIDIA AGX Orin #1 (PLC Edge AI)
$4,000
$4,000
All Allen-Bradley PLCs → OPC-UA → real-time sync
NVIDIA AGX Orin #2 (CCTV Edge AI)
$4,000
$4,000
All CCTV RTSP streams → DLA inference
Industrial Ethernet Switch + Cabling
~$2,500
~$2,500
Plant-floor switch, Cat6A, SFP modules
Local Electrical/Instrumentation Vendor
$8,000–$12,000
~$10,000 est
PLC wiring, conduit, panel work, patch cabling
OxMaint AI Software + Integration (per plant)
$35,000–$55,000
$45,000 avg
Digital Twin build, AI models, LLM, dashboards
Per-Plant Total (hardware + software)
$72,500–$94,500
~$84,500 avg
4-month delivery per plant
Enterprise AI DGX Station (GB300 Ultra, 768GB RAM, 400GbE)
$85,000–$100,000
One-time shared
All 4 plants: physics, simulation, LLM, analytics
Enterprise AI Delivery (3 months)
$45,000–$65,000
One-time
Corporate rollout, LLM fine-tuning, integration
4-Plant Full Rollout (parallel deployment)
~$420,000–$520,000
Total programme
Parallel delivery: all 4 plants + Enterprise AI
$84.5K
Avg per plant
$92.5K
Enterprise AI tier
4 mo
Delivery
$0
Recurring fees
Perpetual · Owned · GB300-Class · Source Access Included
Stop Renting Frontier AI — Own the GB300 Architecture for Your Enterprise
A complete on-prem Enterprise AI tier with GB300-class compute for trillion-parameter fine-tuning, agentic AI reasoning, and physics-driven digital twins. Pre-configured with NVIDIA Mission Control, NVFP4 inference, and the OxMaint software stack — owned outright, full source access, total data sovereignty. No SaaS lock-in. No per-token recurring fees. The platform every frontier-AI cloud is racking, packaged for industrial enterprise deployment.

Frequently Asked Questions

Is the OxMaint Enterprise AI tier the full GB300 NVL72 rack, or a smaller variant?
The OxMaint Enterprise AI tier at $85,000-$100,000 is a DGX Station GB300-class build — a workstation-form-factor system with Blackwell Ultra silicon, 768 GB RAM, and 400 GbE networking, designed to share across 4 plants in a typical enterprise rollout. It's the practical entry point to GB300 architecture for industrial customers. The full GB300 NVL72 rack (72 GPUs, 36 Grace CPUs, 120 kW, $3-4M) is a hyperscaler-scale system that Microsoft Azure, CoreWeave, and Oracle Cloud are deploying at scale — appropriate for AI service providers and frontier-model builders. Most industrial OxMaint customers don't need an entire NVL72 rack; the DGX Station tier delivers the same Blackwell Ultra architecture, NVFP4 inference, and NVIDIA Mission Control management at a fraction of the footprint, power, and cost. For customers who genuinely need rack-scale compute (5+ plants, 100+ assets per plant, real-time multi-plant simulation), OxMaint can configure full NVL72 deployments in partnership with NVIDIA, HPE, Dell, ASUS, and Supermicro — pricing scales with rack count and integration scope.
What workloads actually need GB300 compute, and which can run on RTX PRO Blackwell workstations?
The architectural divide is at trillion-parameter scale and multi-plant federated workloads. RTX PRO Blackwell workstations (PRO 6000 with 96 GB, PRO 5000 with 48 GB) handle most plant-floor AI workloads excellently: vision defect inspection, anomaly detection, predictive maintenance, asset health scoring, work-order summarization, NLP over technician notes, LLM inference up to 70B-class at FP8. These are the workloads OxMaint deploys per plant at $19K Digital Twin Server pricing. GB300-tier compute earns its keep when you need: (1) fine-tuning frontier LLMs (405B+) on enterprise data, (2) trillion-parameter inference with long context windows (1M+ tokens) for full equipment history reasoning, (3) physics simulation for multi-plant digital twins running in real-time, (4) agentic AI doing multi-step reasoning across plant operations, supply chain, and customer service simultaneously. Most industrial deployments run a per-plant RTX PRO Blackwell server tier plus a single shared GB300-class Enterprise AI tier — the per-plant servers handle real-time inference; the Enterprise AI tier handles training, fine-tuning, and cross-plant analytics.
What infrastructure does a GB300 NVL72 rack actually require — power, cooling, floor loading?
A full GB300 NVL72 rack has non-negotiable facility requirements that most existing data centers don't meet. Power: 120 kW continuous per rack, redundant feeds, busway distribution at the row level (not PDU-per-rack), capacity for 1.4× transient spikes during gradient sync. Eight racks per row = 1+ MW. Cooling: 100% liquid — air is not an option. CDU connection to facility chilled water at 30-40°C supply, water filtered to 50 microns, conductivity monitoring, corrosion inhibitors. 409,000 BTU/hr heat per rack. Floor loading: 3,000+ lbs concentrated in a 48U footprint — raised floor pedestals must be reinforced, slab thickness verified. Network: 800 Gb/s ConnectX-8 SuperNICs per GPU (= 57.6 Tb/s rack network). Pre-terminated 144-fiber trunk assemblies with APC polish, TIA-568 verified. Two-person crews complete fiber install in 2.8 hours when factory-terminated. None of this is needed for the OxMaint Enterprise AI DGX Station tier, which uses standard 208V/30A power and forced-air cooling — that's the practical reason most industrial customers buy the workstation tier rather than full racks.
How does NVFP4 precision change the math vs FP16/FP8 for industrial AI?
NVFP4 is NVIDIA's 4-bit floating-point format introduced with Blackwell Ultra's 5th-gen Tensor Cores, and it changes the memory and compute math significantly. Memory cost per parameter: FP16 = 2 bytes, FP8 = 1 byte, NVFP4 = 0.5 bytes. So a trillion-parameter model takes ~2 TB at FP16, 1 TB at FP8, and 500 GB at NVFP4 — meaning a single GB300 NVL72 rack (20.7 TB unified HBM3e) can hold more than 40 NVFP4 trillion-parameter models simultaneously, or one trillion-parameter model with ~40× more KV-cache headroom for long-context inference. Compute throughput: NVFP4 delivers 1.1 exaFLOPS rack-level — 4× the FP8 throughput at 360 PFLOPS. The trade-off is accuracy: FP4 quantization typically introduces 1-3% perplexity increase on benchmarks, acceptable for inference but not always for training. The standard workflow on GB300 systems is FP16/BF16 for foundation training, FP8 for fine-tuning and mixed-precision training, NVFP4 for production inference and reasoning. NVIDIA Dynamo and TensorRT-LLM handle the precision conversion automatically with disaggregated inference patterns that maximize NVFP4 throughput.
How long from sign-up to live operation on the OxMaint Enterprise AI tier?
Six to twelve weeks from sign-up to live operation is typical for the OxMaint Enterprise AI DGX Station tier, including 3 months of Enterprise AI Delivery (corporate rollout, LLM fine-tuning, integration). The compressed timeline works because the system is configured, integrated, and pre-tested in the OxMaint factory before shipping — Blackwell Ultra GPU, NVIDIA Mission Control, NVFP4 inference stack, OxMaint AI software, foundation models, and the integration patterns for multi-plant federation are all installed and validated against synthetic enterprise data before the unit ships. On-site work then collapses to: rack the DGX Station in your data center (1 day), connect to enterprise network and per-plant OxMaint servers (3-5 days), import enterprise data sources (1 week), pre-train custom foundation models against your work order history, technician notes, and equipment data (2-4 weeks), validate inference outputs in shadow mode (2-4 weeks), then production cutover. Most enterprises start with one workload — typically work-order LLM fine-tuning or multi-plant asset health analytics — see ROI in months 4-6, then scale to additional workloads. For full GB300 NVL72 rack deployments (rare for industrial customers), the timeline is 4-6 months due to facility prep and integrator scheduling.

Share This Story, Choose Your Platform!