Evaluation metrics | Description |
---|---|
Accuracy (ACC) | Accuracy, defined as the ratio of correctly predicted observations to the total observations, is calculated as \(Accuracy = \frac{{True \,Positives \,\left( {TP} \right) + True \,Negatives\, \left( {TN} \right)}}{Total \,Observations}\), providing a fundamental measure of the models overall predictive correctness |
Precision (PR) | Precision, a key metric in model evaluation, is quantified as \(\frac{{True \,Positives \,\left( {TP} \right)}}{{True \,Positives\, \left( {TP} \right) + False \,Positives\, \left( {FP} \right)}}\), reflecting the proportion of true positive predictions among all positive predictions made by the model |
Recall (RE) | Recall, an essential metric in classification models, is determined by the formula \(Recall = \frac{{True\, Positives \,\left( {TP} \right)}}{{True\, Positives\, \left( {TP} \right) + False \,Negatives \left( {FN} \right)}}\), capturing the model’s ability to correctly identify all relevant instances within a dataset |
F1-score (FS) | F1-score, a crucial metric that balances precision and recall, is calculated using the formula \(F1\)\(- score = 2*\frac{Precision \times Recall}{{Precision + Recall}}\), thereby providing a harmonic mean that encapsulates the model’s accuracy in classifying data points correctly |
False positive rate (FPR) | FPR, a critical metric in assessing classification errors, is \(FPR = \frac{False \,Positives \,(FP}{{False\, Positives\, \left( {FP} \right) + True\, Negatives \,\left( {TN} \right)}}\), computed as quantifying the proportion of negative instances incorrectly classified as positive by the model |
Error rate (ER) | Error rate, a metric that quantifies the overall prediction inaccuracies of a model, is calculated as \(Error \,Rate = \frac{{False \,Positives\, \left( {FP} \right) + False \,Negatives \,\left( {FN} \right)}}{Total \,Observations}\), effectively measuring the proportion of all predictions that the model got wrong |
AUC score (AUC) | The Area Under the Curve (AUC) Score, a measure of the model’s ability to distinguish between classes, is calculated by plotting the TPR against the FPR at various threshold settings, with the AUC value ranging from 0 to 1, where a higher value indicates better classification performance |
Cohen’s Kappa (KP) | Cohen’s Kappa, a statistical measure of inter-rater agreement for categorical items, is calculated as \(Kappa = \frac{{P_{o} - P_{e} }}{{1 - P_{e} }}\), where \(P_{o}\) represents the relative observed agreement among raters, and \(P_{e}\) is the hypothetical probability of chance agreement, providing a robust assessment of the model’s predictive accuracy beyond random chance |