Skip to main content

Overview

The NVIDIA AI Factory Observability (AIFO) integration monitors your AI infrastructure for NVIDIA GPUs including utilization, power consumption and configuration. In addition to the hardware monitoring of GPUs we monitor the NVLink configuration and throughput to help identify bottle necks in the specialized GPU networks. Finally GPU, memory and CPU utilization of the training and inferencing programs which leverage the GPU resources are also tracked allowing for detailed reporting of program and per-gpu usage. NVIDIA AIFO can monitor GPU hosts that exist on-prem or on cloud hosted AI resources.