In an InfiniBand fabric, the Subnet Manager (SM) is the "brain" of the network, responsible for discovering the topology, assigning Local Identifiers (LIDs), and calculating routing tables. In a multi-node fabric, there is typically one Master SM and several Standby SMs for high availability. To identify the master, the sminfo command is first used; it queries the fabric and returns the LID of the current Master SM. Once the LID is obtained, the engineer must map that numerical LID to a physical server name or Node Description. The smpquery ND (Node Description) command is then executed, targeting that specific LID. This sequence is vital for troubleshooting fabric-wide issues, as logs on the Master SM server provide the definitive record of sweeps, traps, and topology changes. Using smpquery NI (Node Info) would provide hardware-level details like the GUID and device ID, but it does not return the human-readable string (server name) defined in the Node Description, which is necessary for rapid identification in a crowded data center.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit