Labels in UiPath Communications Mining are user-defined categories that can be applied to communications data, such as emails, chats, and calls, to identify the topics, intents, and sentiments within them1. Labels are trained using supervised learning, which means that users need to provide examples of data that belong to each label, and the system will learn from these examples to make predictions for new data2. However, not all labels are equally easy to train, and some may require more examples than others to achieve good performance. Labels that have bias warnings are those that have relatively low average precision, not enough training examples, or were labelled in a biased manner3. Precision is a measure of how accurate the predictions are for a given label, and it is calculated as the ratio of true positives (correct predictions) to the total number of predictions made for that label. A label with 100% precision means that all the predictions made for that label are correct, but it does not necessarily mean that the label is well-trained. It could be that the label has very few predictions, or that the predictions are only made on a subset of data that is similar to the training examples. This could lead to overfitting, which means that the label is too specific to the training data and does not generalize well to new or different data. Therefore, labels with 100% precision may still have bias warnings if they lack training examples, because this indicates that the label is not representative of the underlying data distribution, and may miss important variations or nuances that could affect the predictions. To improve the performance and reduce the bias of these labels, users need to provide more and diverse examples that cover the range of possible scenarios and expressions that the label should capture.
References: 1: Communications Mining Overview 2: [Creating and Training Labels] 3: Understanding and Improving Model Performance : [Precision and Recall] : [Overfitting and Underfitting] : Fixing Labelling Bias With Communications Mining