Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the NVIDIA NVIDIA-Certified Professional NCP-AII Questions and answers with CertsForce

Viewing page 2 out of 3 pages
Viewing questions 11-20 out of questions
Questions # 11:

After configuring HA, the administrator runs cmsh status and notices the secondary head node reports mysql [FAIL]. What is the most likely cause?

Options:

A.

The BCM license expired after HA configuration.


B.

Network connectivity issues between the primary and secondary head nodes.


C.

The secondary head node lacks NVIDIA GPU drivers.


D.

The cluster nodes are powered on during the HA configuration.


Expert Solution
Questions # 12:

What command sequence is used to identify the exact name of the server that runs as the master SM in a multi-node fabric?

Options:

A.

sminfo, then smpquery ND


B.

ibstat, then sminfo


C.

ibnetdiscover, then ibsim


D.

sminfo, then smpquery NI


Expert Solution
Questions # 13:

A system administrator is installing a GPU into a server and needs to avoid damaging the device. What item should be used?

Options:

A.

Anti-ESD strap


B.

Gloves


C.

Protective film


D.

Electric screwdriver


Expert Solution
Questions # 14:

A network engineer is tasked with configuring the management, storage, and compute networks for a new DGX BasePOD deployment. Which statement best describes the network segmentation required for optimal operation?

Options:

A.

A single VLAN for all types of network traffic.


B.

Two networks: one for management and one for compute.


C.

Four networks: compute, storage, out-of-band, and management.


Expert Solution
Questions # 15:

You are validating the environment of an NVIDIA GPU-accelerated data center during post-deployment checks. Which one action is essential to confirm that power and cooling are sufficient for the stable operation of NVIDIA DGX H100 systems?

Options:

A.

Confirm the system fans are running at 100% under all workloads to prevent overheating.


B.

Review the system BIOS to ensure GPU overclocking is enabled for maximum performance.


C.

Use NVSM to disable unused PCIe devices to reduce overall system heat output.


D.

Verify that each DGX system is connected to redundant, properly rated PDUs and that all power supplies are reporting nominal input.


Expert Solution
Questions # 16:

You are a network administrator responsible for configuring an East-West (E/W) Spectrum-X fabric using SuperNIC. The Bluefield-3 devices in your network should be set to NIC mode with RoCE enabled to optimize data flow between servers. You have access to the Spectrum-X management tools and the necessary documentation. You need to use specific configuration commands to achieve this setup. Which of the following steps and commands are necessary to configure the Bluefield-3 devices in NIC mode for the E/W Spectrum-X fabric using SuperNIC? (Pick the 2 correct responses below)

Options:

A.

Use the command sudo mlxconfig -d /dev/mst/ set LINK_TYPE_P1=2 to enable Ethernet on the Bluefield-3 devices.


B.

Use the command sudo mlxconfig -d /dev/mst/ set DISABLE_SPECTRUM_X=1 to reduce overhead.


C.

Use the command sudo mlxconfig -d /dev/mst/ set INTERNAL_CPU_OFFLOAD_ENGINE=1 to configure the SuperNIC to operate in NIC mode.


D.

Use the command sudo mlxconfig -d /dev/mst/ set DPU_MODE=1 to set up the Bluefield-3 devices in DPU mode.


Expert Solution
Questions # 17:

A system engineer needs to set the vGPU scheduling behavior for all GPUs to share the scheduling equally with the default time slice length. What command should be used?

Options:

A.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"


B.

esxcli graphics module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"


C.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=FRL=0x01"


D.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x00"


Expert Solution
Questions # 18:

Why is it important to provide a large and high-performance local cache (using SSDs configured as RAID-0) for deep learning workloads on DGX systems?

Options:

A.

Local SSD cache allows users to increase the number of NFS threads on the server without impacting storage reliability.


B.

Using local SSD cache in RAID-0 enables direct GPU access to files without host CPU involvement, further boosting performance.


C.

Local SSD cache in RAID-0 is necessary to provide redundancy in case one of the drives fails during long training runs.


D.

A local SSD cache in RAID-0 ensures that most training data is read only once from the network, significantly reducing NFS traffic.


Expert Solution
Questions # 19:

During HPL execution on a DGX cluster, the benchmark fails with "not enough memory" errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?

Options:

A.

Reduce the problem size while maintaining the same block size.


B.

Set PMAP to 1 to enable process mapping.


C.

Increase block size to 6144 to maximize GPU utilization.


D.

Disable double-buffering via BCAST parameter.


Expert Solution
Questions # 20:

A leaf switch shows "FW Version Mismatch" alerts for transceivers after cluster expansion. Which tool validates transceiver firmware against expected versions?

Options:

A.

flint


B.

iblinkinfo


C.

mlxconfig


D.

ethtool


Expert Solution
Viewing page 2 out of 3 pages
Viewing questions 11-20 out of questions