Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the NVIDIA NVIDIA-Certified Professional NCP-AII Questions and answers with CertsForce

Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions
Questions # 21:

If two ports must be connected, but one is SFP and one is QSFP, for example, to connect a 25 GbE HOST CHANNEL ADAPTER to a QSFP port capable of both 100 GbE and 25 GbE, which of the following solutions would best meet this requirement?

Options:

A.

SFP Connectors


B.

SFP to 1G BASE-T (RJ45) adapter


C.

QSA Adapter


Expert Solution
Questions # 22:

A system administrator needs to install a container toolkit and successfully run the following commands:

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime docker

What step should be taken next to finish the installation?

Options:

A.

dpkg -i doca-host-repo-ubuntu < version > _amd64.deb


B.

apt-get install cuda-drivers


C.

systemctl restart docker


D.

apt-get remove nvidia-container-toolkit


Expert Solution
Questions # 23:

What is the primary purpose of running an NCCL burn-in test on a new GPU cluster?

Options:

A.

To test whether GPUs are properly detected by the operating system and have the correct drivers installed.


B.

To maximize GPU utilization for machine learning workloads and automatically tune deep learning frameworks.


C.

To detect and resolve hardware or interconnect issues before production by stressing GPU communication links.


D.

To benchmark application-specific runtime performance of AI models using real user data and production training scripts.


Expert Solution
Questions # 24:

Your tasked with updating both NVIDIA GPU drivers and DOCA drivers on a set of servers used for AI workloads. The environment previously had an older driver stack and custom kernel modules. What is the most important step to successfully upgrade the drivers without causing conflicts?

Options:

A.

Update the GPU driver leaving the DOCA and OFED drivers unchanged as long as they are detecting the hardware properly.


B.

Validate the driver version post-install since the fresh install will overwrite the legacy drivers.


C.

Keep the older driver running alongside the new version in case you need to roll back the upgrade.


D.

Uninstall all existing GPU and DOCA-related drivers and associated kernel modules before the new install.


Expert Solution
Questions # 25:

A systems engineer is updating firmware across a large DGX cluster using automation. What is the best practice for minimizing risk and ensuring cluster health during and after the process?

Options:

A.

Drain nodes from the scheduler, run pre-update diagnostics, update firmware in batches, and verify health post-update before scaling to the next batch.


B.

To save time, simultaneously update all nodes in the cluster without draining or diagnostics.


C.

Update nodes that have reported faults, leaving others on older firmware.


D.

Drain nodes from the scheduler, update firmware in batches, skip diagnostics and verify health post-update before scaling to the next batch.


Expert Solution
Questions # 26:

A system administrator needs to install a GPU/DPU in a server. The server has a free PCI-e slot, there are enough free PCI-e lanes, and there is enough room for the card. Which procedure should be followed?

Options:

A.

Ensure the server has enough power. Verify compatibility of cables with server ' s platform. Make sure the server is down to remove cables safely. Do not wear an ESD bracelet.


B.

Ensure the server has enough power. Make sure the server is down to remove cables safely. Wear an ESD bracelet.


C.

Ensure the server has enough power. Make sure the server is up and running with attached cables. Wear an ESD bracelet.


D.

Ensure the server has enough power. Verify compatibility of cables with server ' s platform. Make sure the server is down to remove cables safely. Wear an ESD bracelet.


Expert Solution
Questions # 27:

After ClusterKit reports " GPU-Host latency exceeds threshold, " which NVIDIA diagnostic tool should be used to isolate hardware faults?

Options:

A.

Re-run ClusterKit with --stress=gpu -Y 60 to extend test duration


B.

nvidia-smi topo -m to inspect GPU topology connections


C.

DCGM Diags dcgmi diag -r 2


D.

ib_write_bw to measure InfiniBand bandwidth between nodes


Expert Solution
Questions # 28:

For an NVIDIA Enterprise AI Factory with 256 GPUs, which storage solution characteristic is most critical to validate during scaling tests?

Options:

A.

Consistent per-node throughput > 8 GiB/s.


B.

Single-node write performance during idle clusters.


C.

RAID rebuild times under disk failure.


D.

Maximum 4K random read IOPS exceeding 1 million.


Expert Solution
Questions # 29:

If two ports must be connected, but one is SFP and one is QSFP, for example, to connect a 25 GbE Host Channel Adapter to a QSFP port capable of both 100 GbE and 25 GbE, which solution would best meet this requirement?

Options:

A.

QSA adapter.


B.

SFP connectors.


C.

SFP-to-1G BASE-T RJ45 adapter.


D.

Standard QSFP-to-QSFP DAC cable.


Expert Solution
Questions # 30:

An engineer needs to completely remove NVIDIA GPU drivers from an Ubuntu 22.04 system to troubleshoot conflicts. Which command sequence ensures all driver components are purged?

Options:

A.

sudo ubuntu-drivers uninstall


B.

sudo rm -rf /usr/lib/nvidia


C.

sudo apt-get remove nvidia-driver-550


D.

sudo apt-get purge nvidia-* & & sudo apt-get autoremove


Expert Solution
Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions