Your document collection contains a subset of invoices that are being identified as textual near duplicates yet are all from different days and are of differing amounts. What setting can you toggle to see if the results can be improved?
The correct answer is A. Ignore Numbers. Relativity’s near duplicate documentation explains that when Ignore Numbers is set to true, the similarity calculation considers only tokens beginning with letters, which means numbers are excluded from the similarity percentage calculation. That is directly relevant for invoice populations, because many invoices share nearly identical wording while differing mainly by dates, invoice numbers, quantities, or amounts. In a set like the one described, toggling Ignore Numbers is the setting most likely to change how those documents are grouped and help you assess whether the current near duplicate results can be improved.
The other options do not fit this use case. MD5Hash Verifier is related to hash-based duplicate validation rather than textual near duplicate tuning. Relativity Compare is a review tool for examining differences after grouping, not a setting that changes how the grouping is calculated. Auto-recognize Dates is not the structured analytics tuning option documented for this scenario. Therefore, for invoice-heavy data where numerical differences are driving questionable textual near duplicate results, the correct setting to toggle is Ignore Numbers.
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit