NVIDIA AI Infrastructure NCP-AII Question # 7 Topic 1 Discussion
NCP-AII Exam Topic 1 Question 7 Discussion:
Question #: 7
Topic #: 1
During East-West fabric validation on a 64-GPU cluster, an engineer runs all_reduce_perf and observes an algorithm bandwidth of 350 GB/s and bus bandwidth of 656 GB/s. What does this indicate about the fabric performance?
A.
Inconclusive; rerun with point-to-point tests.
B.
Optimal performance; bus bandwidth near theoretical peak for NDR InfiniBand.
C.
Critical failure; bus bandwidth exceeds hardware capabilities.
D.
Suboptimal performance; algorithm bandwidth should match bus bandwidth.
When evaluating NVIDIA Collective Communications Library (NCCL) performance, it is vital to distinguish betweenAlgorithm BandwidthandBus Bandwidth. For an all_reduce operation, the Bus Bandwidth represents the effective data transfer rate across the hardware links, which includes the overhead of the ring or tree collective algorithm. In an NDR (400G) InfiniBand fabric, the theoretical peak per link is 50 GB/s (unidirectional). In a 64-GPU cluster (8 nodes of 8 GPUs), achieving a bus bandwidth of 656 GB/s indicates that the fabric is efficiently utilizing the multiple 400G rails available on the DGX H100. This result is considered optimal as it reflects near-line-rate performance when accounting for network headers and synchronization overhead. Algorithm bandwidth is naturally lower because it represents the "useful" data moved from the application's perspective. If the bus bandwidth were significantly lower, it would suggest congestion, cable faults, or sub-optimal routing.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit