Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the NVIDIA NVIDIA-Certified Professional NCP-AII Questions and answers with CertsForce

Viewing page 2 out of 4 pages
Viewing questions 11-20 out of questions
Questions # 11:

During a 48-hour NeMo question-answering model burn-in test, GPU memory errors occur when processing large datasets. Which configuration strategy prevents Out-of-Memory (OOM) errors while maintaining processing efficiency?

Options:

A.

Set blocksize= " 1GB " for data loading and enable RMM asynchronous allocation.


B.

Switch from FP16 to FP32 precision for numerical stability.


C.

Disable add_filename for Parquet files to reduce metadata.


D.

Increase files_per_partition to 1000 for larger batch processing.


Expert Solution
Questions # 12:

You are a network administrator responsible for configuring an East-West (E/W) Spectrum-X fabric using SuperNIC. The Bluefield-3 devices in your network should be set to NIC mode with RoCE enabled to optimize data flow between servers. You have access to the Spectrum-X management tools and the necessary documentation. You need to use specific configuration commands to achieve this setup. Which of the following steps and commands are necessary to configure the Bluefield-3 devices in NIC mode for the E/W Spectrum-X fabric using SuperNIC? (Pick the 2 correct responses below)

Options:

A.

Use the command sudo mlxconfig -d /dev/mst/ < device > set LINK_TYPE_P1=2 to enable Ethernet on the Bluefield-3 devices.


B.

Use the command sudo mlxconfig -d /dev/mst/ < device > set DISABLE_SPECTRUM_X=1 to reduce overhead.


C.

Use the command sudo mlxconfig -d /dev/mst/ < device > set INTERNAL_CPU_OFFLOAD_ENGINE=1 to configure the SuperNIC to operate in NIC mode.


D.

Use the command sudo mlxconfig -d /dev/mst/ < device > set DPU_MODE=1 to set up the Bluefield-3 devices in DPU mode.


Expert Solution
Questions # 13:

During HPL execution on a DGX cluster, the benchmark fails with " not enough memory " errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?

Options:

A.

Reduce the problem size while maintaining the same block size.


B.

Set PMAP to 1 to enable process mapping.


C.

Increase block size to 6144 to maximize GPU utilization.


D.

Disable double-buffering via BCAST parameter.


Expert Solution
Questions # 14:

An engineer must ensure that a BlueField-3 NIC firmware download matches the cluster’s PSID. Which step is critical before installation?

Options:

A.

Check that the DPU’s BMC IP is reachable by ping.


B.

Confirm that the firmware file size matches the DPU’s flash capacity.


C.

Use mstflint -d < PCI_ID > query to validate the device PSID before selecting the firmware image.


D.

Verify that the SHA256 hash of the firmware matches NVIDIA’s public ledger.


Expert Solution
Questions # 15:

An administrator needs to perform a comprehensive pre-production stress test on a DGX H100 system. Which command validates GPU, CPU, memory, and storage components while following NVIDIA’s recommended procedure?

Options:

A.

nvidia-smi -q | grep " GPU Stress Test "


B.

sudo nvsm stress-test --force


C.

stress --cpu $(nproc) --io $(nproc) --timeout 600


D.

./gpu_burn 60


Expert Solution
Questions # 16:

A leaf switch shows " FW Version Mismatch " alerts for transceivers after cluster expansion. Which tool validates transceiver firmware against expected versions?

Options:

A.

flint


B.

iblinkinfo


C.

mlxconfig


D.

ethtool


Expert Solution
Questions # 17:

An InfiniBand administrator needs to run performance benchmarks on new devices added to the fabric. What tool should be used to check the latency?

Options:

A.

tcpdump


B.

ib_write_lat


C.

ibdiagnet


D.

perfmon


Expert Solution
Questions # 18:

You are installing the operating system as part of the initial setup for a new NVIDIA Base Command Manager cluster. Which two of the following actions are essential for a successful OS installation on the cluster’s head node?

Pick the 2 correct responses below.

Options:

A.

Download the latest BCM ISO and verify its integrity using the provided checksum, then start the installation.


B.

Configure network switches for PXE boot to all compute nodes before installing the OS on the head node.


C.

Set the desired time zone and configure NTP synchronization during the OS installation wizard.


D.

Start the head node OS installation process with the system BIOS set to legacy boot mode instead of UEFI.


Expert Solution
Questions # 19:

A team is validating a DGX BasePOD deployment. Using cmsh, they run a command to check GPU health across all nodes. What indicates that the system is ready for AI workloads?

Options:

A.

The command output is ignored if the system powers on without errors.


B.

At least half of the GPUs report Status_Health = OK.


C.

All GPUs report Status_Health = OK and Health = OK for each device.


D.

Only the head node ' s GPUs need to be healthy.


Expert Solution
Questions # 20:

When verifying network cable signal integrity during cluster deployment, which measurement result most strongly indicates a cable signal problem?

Options:

A.

Repeated CRC errors and intermittent port flapping reported by switch counters.


B.

Output of ifconfig showing link speed at the expected rate on both ends of the cable.


C.

Network pings between all cluster nodes return responses with delays under 2 ms on a 100Gb network.


Expert Solution
Viewing page 2 out of 4 pages
Viewing questions 11-20 out of questions