NVIDIA AI Infrastructure NCP-AII Question # 15 Topic 2 Discussion
NCP-AII Exam Topic 2 Question 15 Discussion:
Question #: 15
Topic #: 2
An administrator needs to perform a comprehensive pre-production stress test on a DGX H100 system. Which command validates GPU, CPU, memory, and storage components while following NVIDIA’s recommended procedure?
The correct command is sudo nvsm stress-test --force. NVIDIA recommends using NVSM for the DGX pre-flight stress test before putting a DGX H100 system into production or after servicing. The documented NVSM stress test can run checks across supported components, including GPUs, CPU, memory, and storage, and the recommended command for all supported components is sudo nvsm stress-test --force. nvidia-smi -q provides detailed GPU information, but it does not execute a full platform stress test. The Linux stress command can load CPU and I/O subsystems, but it is generic and does not validate the DGX platform using NVIDIA’s health model. gpu_burn may stress GPUs, but it does not cover CPU, system memory, storage, and DGX-specific platform checks in the recommended way. During server bring-up, NVSM is preferred because it understands DGX hardware components and can identify platform health issues before the node is released to production workloads.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit